Universal Access in the Information Society

, Volume 8, Issue 3, pp 137–153

e-Document management in situated interactivity: the WIL approach


  • Paolo Bottoni
    • Dipartimento di InformaticaUniversità di Roma “La Sapienza”
  • Fernando Ferri
  • Patrizia Grifoni
  • Andrea Marcante
    • ITC-CNR Unità Staccata di Milano
  • Piero Mussio
    • DICOUniversità degli Studi
    • ITC-CNR Unità Staccata di Milano
  • Amanda Reggiori
    • FSLLSUniversità Cattolica del Sacro Cuore
Long Paper

DOI: 10.1007/s10209-008-0142-z

Cite this article as:
Bottoni, P., Ferri, F., Grifoni, P. et al. Univ Access Inf Soc (2009) 8: 137. doi:10.1007/s10209-008-0142-z


Complex organizations need to manage a large amount of information that their employees produce and use in the form of documents: therefore, information systems are adopted to access these documents in electronic format (e-documents) through Intranet or Internet. These documents are composed, organized and annotated in different ways according to the rules adopted by specific professional communities. Such rules reflect the different and peculiar culture and skills of the communities producing them. The large amount of information available today can be potentially accessed in real time. This has increased the need for syntactic and semantic characterization of documents and for tools that allow their effective access and exploitation on the Net, their easy retrieval and management, their annotation to adapt and personalize them on the base of users’ characteristics and diversities. This paper describes the approach adopted for the Web Indexing Language (WIL) system, a system conceived for supporting users interactivity during editing, indexing, and annotating e-documents on the basis of conventions adopted for their production and distribution. In particular, the approach capitalizes on the notion that the document layout reflects the relationships among the different semantic components of the document. The model and the architecture of the WIL system aim at improving e-document indexing, searching, editing and annotating, and at exploiting the description of the logical structure of the document itself to squeeze the information about the document content which are usually grasped by a reader at a glance.


HCIInteractivityXMLe-Document management

1 Introduction

The development of network communication has fostered the use of the Web as an environment and a user interface for publishing and accessing information and in general resources available on the Internet.

In such a scenario, complex organizations (e.g., public institutions and private companies) with large communities of employees, structured in specialized departments, need support for their information intensive activities such as collecting, cataloguing, browsing, annotating, fragmenting, assembling, packaging and re-distributing a large amount of networked information. In these working organizations, it is customary to archive, manage and exploit large quantities of documents that, unluckily, are often stored in different formats (paper-based, digital, audiovisual) and are usually presented following different formats suited to the different goals they are aimed to (definition of the organization and of the internal procedures to follow, formalization of the relationships with the customers, product description and documentation of their production processes, technical documentation). Today, document exploitation is fundamental to feed the knowledge that an organization exploits in its daily practice both for self-presentation and as a reference to execute new projects and face new situations.

The variety of formats and objectives involved in the production of such documents makes it often difficult to realize knowledge bases that are enough specialized to become knowledge for the organization, suitably used by all its members, which are enabled to retrieve and manipulate resources in an associative way and depending on the context of use.

In their daily activity, employees cooperate with their colleagues by exchanging documents in their organization or in the project they are involved in. These documents are structured and authored applying explicit or implicit notational rules (here intended as the secondary notation [1]). These rules help a community of practice to recognize the information in the documents. In cooperative work, the annotation is a basic operation that permits to extend documents with informal and subjective knowledge: annotation contents are often shared with colleagues in order to enhance common knowledge and often also with the aim to make the authoring of the document itself to progress following common agreement. Annotation and marking up are well-known documents indexing methods. Interaction with computer systems for their effective and efficient support is hampered by the fact that people cooperating to a same task and belonging to different companies or to different departments (e.g., accounting, design, store, engineering, etc.) of a same company adopt different languages and different notational rules, which conform to their specific skills and task. Communication gaps in collaborative working and in document management arise as a consequence. Therefore, people need suitable facilities to easily customize documents to their language and to the notational rules they are accustomed to.

The evolution of documents in electronic form (e-documents) and the diffusion of Web-based systems suffer from these communication gaps.

As an example, a paper-based document is usually shaped according to agreed conventions which establish the way to structure the different paragraphs and pieces of information so that any individual belonging to the same community of the author or a similar one will recognize at a glance what kind of document it is (i.e., a technical report or a set of instructions to accomplish a task), and will be able to differentiate it from among others and locate its more interesting parts. This has suggested that shape recognition enables a user to identify pieces of information of direct interest and that visual highlighting enhances the possibility to identify a document among others.

Electronic documents increase the possibility of collaboration among people belonging to different communities, but if a common recognition of shape is not agreed upon, documents, where the physical layout and the logical grouping of information, are arbitrarily chosen by the author will be produced and circulated, making difficult to visual index them at a glance.

The basic assumption behind this paper is that automatizing visual indexing permits to improve document retrieval. Towards this objective, this paper proposes the methodology and the architecture of the Web Indexing Language (WIL) system for editing, annotating, indexing and retrieving e-documents. This approach starts from the Human Computer Interaction (HCI) model developed in [2] and capitalizes on the description of the logical structure of a document as presented in [3].

The WIL approach exploits the idea that documents can be edited, classified and annotated depending not only on their textual content but also on their formatting [4]. In particular, some spatial relationships emerging from the document organization, i.e., the document layout, actually express semantic relationships among document components. However, their interpretation may differ for different users, or simply for different purposes of using the document or also according to the cognitive criteria or the ways adopted to represent ideas, the results of activities, and the procedures followed to obtain them. The layout is taken to indicate the grouping of information elements within a single user-interface element, namely, the browser window in which a Web document is presented.

The WIL system aims at improving precision and recall with respect to the usual general purpose search engines, by including semantic meta-data drawn from a document’s presentation, from its content, and as a result of user interaction. In particular, WIL explores the possibility of developing visual languages for augmenting a document with annotations and descriptions of its content and for retrieving these annotations and descriptions from a repository. The languages will be specific to a community (e.g., a company) and therefore shared by all its members, or to restricted groups in the community, or even defined for personal annotations.

In the case of indexing and retrieving, the WIL approach couples traditional retrieval by keywords with the possibility to formulate queries based on the document logical structure, on its presentation, or on the content of images therein. This is intended to overcome typical limitations of current techniques for document retrieval, which are based on keywords and which do not take into account the linguistic (concerning the peculiarities of the domain language of the users) and the visual (referred to the document layout) context. In the WIL approach, context information is provided by defining also the structure, the layout and the organizational elements of e-documents, which differ according to the different working environments in which they are produced, managed and annotated.

The notational rules, as diffused within a user community, are formalized as visual languages and managed through appropriate tools: the formalization indicates which semantic relations among individual information objects in an e-document are expressed through spatial relations among their visual representations. WIL focuses on visual languages to offer interaction modes customized to the work environments, so as to enable a user to express and communicate content for retrieving, classifying and annotating a document or one of its fragments. As an example, a sketching component is integrated to enable the users to represent, in a rapid and approximate way, the information they are looking for. Such functionalities require a system architecture with a meta-level enabling the definition of visual languages adapted to the communication needs of their users. The visual representations, the meaning conventionally adopted by a community and the meanings locally defined and used by groups of the whole community are stored in different structures. The meta-level definition also includes the specification of the spatial relationships among the visual representations of the document content. Hence, specific operators can be derived enforcing the use of legal arrangements of these relationships during documents creation.

The paper is organized as follows: in Sect. 2, the current evolution of documents and related user issues are presented, while Sect. 3 discusses related work. The interaction between human and computer systems to manage e-documents on the Web is analyzed in Sect. 4. Sect. 5 presents a description of web documents in terms of Information Objects, and Sect. 6 introduces the problem of managing multimedia information objects in electronic documents. Sect. 7 presents the proposed approach to annotating e-documents. Sect. 8 outlines the functional architecture of the WIL system. Sect. 9 concludes the paper with some final considerations.

2 From paper-based to electronic documents

In the electronic world, documents have progressively been converted into e-documents, i.e., documents that exist ‘virtually’ as the results of the interpretation of a program P by a computer. Even if e-documents inherit some features from their paper-based ancestors, they have deeply changed the way they can be used, handled, accessed, modified and annotated [5].

A paper-based document is indeed a passive medium, whose content could be highlighted but not easily manipulated, updated or extended neither by hand nor by a computer. As a consequence, new digital methods for manipulating (e.g., to simply rearrange the layout when a new paragraph is inserted or to detect and correct misspelling), communicating, storing, and processing information could be exploited but only through suitable devices and procedures. A paper-based document is an artifact constituted of a physical support suitably modified by a human activity. It becomes a document when it is used to register a message, an idea, a concept, a story, data, to be communicated with different purposes: as tool of study, reference or research during an activity, or simply for disseminating information. The users of the document interpret it by applying their own cognitive criteria and by recognizing sets of elementary signs on the document’s support. The identification of which arrangements of signs are significant to the reader may depend on their perceptual properties, but also on contextual interpretation of their functional role.

Communities of practice may develop notations with the aim to represent on a permanent support (namely to materialize) abstract and concrete concepts, data, instructions, procedures, strategies, results of activities, etc. Such notations are drawn from the activities and experiences of the community, and are defined by
  • a finite set of simple visual elements (the notation alphabet);

  • a set of lexical, syntactic and layout rules to assemble the simple elements into more complex ones, until the document is completed;

  • a set of methods to organize, model and materialize, onto suitable supports, the set of pixels which are recognized by the user;

  • a set of mappings and interpretations to associate simple and complex elements with meanings, so that a document can be organized to transmit the intended meaning or be correctly interpreted.

Users read (i.e., perceive and interpret) and annotate the document following reading patterns: their comprehension is constrained by their level of knowledge of the notation used by the author of the document. Pre-electronic documents are stable in time, but they could possibly change due to human intervention, or support degradation. However, they are considered closed systems, which do not change during the time, having a passive role in the operations performed by the user on them.

In the electronic world, documents have a virtual existence, which is the result of the computer interpretation of a program and of a set of data: documents become e-documents, dynamic entities which evolve during time, able to interact with their users and to execute tasks to support them in their activities and which are presented as integrated configurations of multimedia fragments (texts, sketches, icons, images). In the Web, e-documents evolve into “a unit consisting of dynamic, flexible, non-linear content, represented as a set of linked information items, stored in one or more physical media or networked sites; created and used by one or more individuals in the facilitation of some process or project” [6].

e-Documents are often the product of knowledge acquisition, in a human-understandable form that integrates informal multimedia knowledge representation with formal computational structures. Unfortunately, the visual presentation of symbolic knowledge, either in diagrammatic form or through simple arrangements of icons, is not straightforward, because the expression of such knowledge in a formal language is usually not simply understood by non-experts. End users usually need skilled support in expressing their tacit and explicit knowledge and in representing it as a formal language.

This paper approaches e-document authoring as a process distinguished from that of its materialization: e-documents have to be moulded so that all the users, not necessarily computer experts, can perceive, understand and handle them in a useful way. e-Documents are reactive, i.e., they react to proper interactions by activating computations and presenting results, and possibly pro-active, i.e., they activate special computations without an explicit request from the user. In this sense, e-documents are Visual Interactive Systems (VISs) that exist as the result of the activity of a program and of the interaction with the user. As VISs, e-documents include functionalities for the user to operate on the document and for the pro-active activities executed by the document itself.

Functionalities are determined by user needs in specific contexts. However, in general, a VIS should guarantee and enhance some basic user activities such as annotating, editing and indexing the documents.

3 Related work

The WIL approach considers several aspects pertaining to the design of hypermedia Web-based applications, from a formal framework for the definition of visual languages and a reconsideration of the notion of document, also taking into account several features of interactivity, to the proposal of tools ranging from annotation to information retrieval and content management. This section discusses the most relevant influences from the literature on the development of the WIL approach.

In [7], an approach is proposed to semantic document indexing by extracting conceptual content from an index which refers to document text and images. The approach is based on the technique of Latent Semantic Indexing (LSI) [8], which has been conceived and used for textual information retrieval. In a complementary way, WIL aims at extracting conceptual content from text indexing while considering its spatial positioning in the document presentation. LSI is committed to uncovering the semantic correlation between keywords in the document title and image features in the same web document, in order to improve the retrieval of multimedia web documents. The LSI approach manages concepts, so that documents can also be retrieved which do not contain the keyword specified in the query, but equivalent keywords for describing the concept. LSI also manages synonymy and polysemy to allow users from different contexts, or with different needs, knowledge, or linguistic habits, to describe the same concept using different terms; on the other hand, this helps solving ambiguities raising from terms used with different meaning in different contexts. Textual indexing is coupled with image global features extracted from the color histogram and from the color anglogram, a spatial color indexing scheme [9, 10].

In [11], document management is based on document properties, rather than on document organization as in WIL. Properties are the primary, uniform means for organizing, grouping, managing, controlling, and retrieving documents; they are meaningful to users and can express system activities, such as sharing criteria, replication management and versioning, therefore enabling the provision of document-based services on a property infrastructure. Users assign informative or static properties to documents, and effective or active properties, which are activated to control or augment the document functionality. Beside traditional meta-data items, properties include user categorizations, keywords, links to related items, and content-based descriptors. A new paradigm for document management infrastructures is defined and explored through an experimental prototype, the Placeless Documents system, designed on the basis of three core features: uniform interaction, user-specific properties, and active properties.

An interesting exploration of the evolution of the paper as a communication medium due to the digital revolution is in [12]. Paper-based documents are compared with user interfaces, outlining the commonalities that bring to sketch a paper-based user interaction approach to user-interface design that preserves the traditional conventions adopted in document management.

Gaines and Shaw [13, 14] have developed an active document technology in which a word processor embeds graphics, hypertext link and active components including semantic network functionalities. This has been applied to developing procedure manuals for a corporation, as well as to modeling organizational rules and constraints to be readily understood, updated and used in the organization’s information systems.

Pedauque [15] has analyzed and reformulated the concept of document in the framework of a semiotic context where the computational system and the human interact for communication. He has proposed a three-faceted definition: the first presents the document as an object of communication structured according to agreed rules. The meaning of the document is established in relation to its context of usage and to the conventions and habits typical of the community of use; finally, the question is raised of the document’s status as a trace of actual communication.

The terms interaction and interactivity have raised discussions both in the HCI and the Computer Mediated Communication (CMC) sectors. In [16], the communication viewpoint is compared with several approaches. Many approaches try to define the dimension of interactivity. Monodimensional interactivity is defined as the capability of a communication system to reply like a person involved in a conversation [17, 18]. Bidimensional interactivity has been introduced as the information flow between user and document, user and computer, user and user [19]. Threedimensional interactivity [20, 21] is characterized by three variables: frequency of interaction, choices available, and the degree of relevance of the message in the context of the human–computer communication. Fourdimensional interactivity [22] is defined as a four-tuple: set of actions that a medium offers to the user, possibility to modify existing or add new content, quantity of the choices in each dimension, progression of the communication. Manydimensional interactivity [23] presents interactivity in relation to features about communication technologies, taking into account, several aspects, such as accessibility of the information to be presented to the user, possibility of updating it, and friendliness of communication among users.

In [24], interactivity has been defined on the basis of the relationships among the owners and suppliers of information and those who control its distribution by selecting the topics and determining the transmission timing. Consequently, four types of interactivity originate:
  • transmission, where information is produced and controlled by a centralized information provider;

  • conversation, where information is produced by information consumers who determine its distribution;

  • consultation, where information is produced and controlled by a central provider but its distribution is determined by its consumers;

  • registration, where information is produced, updated and controlled by the consumers under a centralized control.

The WIL approach integrates these four types of interactivity and proposes an interaction model in which the HCI process is considered as a cyclic process and, during interaction, the user and the interactive system communicate by materializing and interpreting a sequence of messages (Fig. 1).
Fig. 1

The HCI process model: i(ti) is materialized at interaction time ti

In the seminal work of [25], the human side of interaction has been explored, and the existence of semantic and articulatory distances in the evaluation and execution of HCI activities are identified. More recently, the computer semiotics approach [26, 27], brings at the center of reasoning on HCI the system of signs—visual, but also haptic and audio—used by the human and the computer to communicate (the VCL elements in the WIL language). These signs are interpreted both by the user and the system. Humans interpret the signs within the context of their activity and the whole interaction process depends on the pragmatic level of the communication. According to this perspective, computer artifacts expand the user’s universe of signs, allowing evolution of the traditional way users perform their daily tasks. Differing from [27], in WIL the computer system is not considered as a one-shot message sent from the designer to the user, but as a pro-active system involved in the double process of interpretation and materialization, which generates from interaction. On the machine side, the behavior of computing system determined by human activities must be modeled, highlighting the problems that arise on the computer side in capturing and interpreting human actions [28]. However, the human and the system form a unique system, the syndetic system, whose dynamics are determined by the activities of two subsystems of different nature, namely a cognitive system (the human) and a computing system (the computer) [29]. Reaching the goals of the interaction process requires balancing the requests of the two subsystems.

Several groups are working to provide tools for interactive production and retrieval of annotations of Web-based material, with the objective of improving conversation among users or between users and the computer system, especially in the field of text and image annotation.

In particular, medical image annotation has seen several developments, including AnnoteImage [30] and PAIS [31]. The first allows users to create, and publish on the Web, their own atlases of annotated medical images. Annotation data are maintained in text files with a proprietary syntax and retrieved via Common Gateway Interface (CGI) scripts. PAIS improves on this by using Java applets rather than CGI scripts, and by exploiting an XML-based language to describe image meta-data and annotations [32]. Annotated medical images are also kept in the server network supported by the I2Cnet system [33]. In this case, annotations, in the form of ASCII data, are kept privately and sent to other medical personnel via email, or posted on the network after a moderator’s review. A pioneering effort concerning textual annotation was Annotea (http://www.w3.org/2001/Annotea), which evolved towards annotation of multimedia material in Vannotea [34]. In this family of systems, the annotations conform to an RDF-based schema and the link to the source material is realized via Xpointer. They have a client–server architecture, where the client is a specific browser. Both Microsoft and IBM have developed annotation systems, called MRAS [35] and VideoAnnEx (http://www.research.ibm.com/VideoAnnEx/), respectively.

4 Interacting with e-documents

A specific HCI process model is proposed to define the interaction between humans and e-documents. The notion of interactivity adopted in WIL presents interaction as the process followed by two or more subjects when communicating (either synchronously or asynchronously) from remote positions, by means of computational systems playing an active role in interpreting and materializing messages exchanged by the human communicants.

Communication occurs through a sequence of messages materialized as images on the screen. Users recognize them, derive the meaning of the messages, decide the next action to perform in the communication process and manifest their decisions through actions on the system’s input devices; the system captures these actions as input event stream, interprets them with reference to the image on the screen, determines the answer to this activity and materializes the results on the screen, so that they can be perceived and interpreted by the human (see Fig. 1) [2]. However, two interpretations for each message are assumed to exist: one performed by the human user, the other by the computer that interprets each user action and each message according to the rules embedded in a program P.

Users understand the meaning of HCI messages because they recognize some subsets of pixels on the screen as characteristic structures (css). Users associate with each cs a meaning, thus giving rise to a construct called user characteristic pattern (ucp). All the css perceivable in the image, including the image itself, together with their descriptions, produce what is called the user visual sentence. From the machine point of view, a cs is a set of pixels generated and managed by a computational process P activated as a consequence of the computer interpretation of some specification. The state of the interaction process at step k is described by relating the state of P with the image on the screen, defining a system visual sentence, vsk = <ik, dk, <intk, matk>>, where intn is a function that maps the css identified in i into d, a multiset of attributed symbols describing the state of the interpretation process, and matn maps elements of d into css in i. If, in the state vsk = <ik, dk, <intk, matk>>, the user generates an action on ik, the reaction by P will result into the creation of a new vsk+1, whose image ik+1 appears on the screen and whose dk+1 describes the new state of the generating process P (Fig. 2). Each association of a cs with a symbol in d, as defined by intk, or matk is called a characteristic pattern (cp).
Fig. 2

An example of interaction: the user perceives the image ik on the screen, recognizes the menu on the right, and selects an e-document from the menu. P reacts, interpreting the action of the user and materializing the image ik+1

From the point of view of the users, interaction occurs with a Visual Interactive System, a dynamic open system that exists only as the result of the execution of the program specifying P, determining the dynamics of the VIS with respect to the activities performed by the user. Hence, each visual sentence vsn = <in, dn, <intn, matn>> linking the state of P to the image on the screen describes an instantaneous configuration of the VIS. The set of visual sentences defines a visual language [36].

WIL capitalizes on the definition of visual languages and proposes a model of e-documents for visually editing, indexing and querying. Each document is seen as a set of units conveying different types of information organized according to conventions and notations developed by communities of practice. Such notational conventions constrain the layout and the appearance of the document, thus permitting the expression of tacit information. Therefore, visual models are considered that highlight the layout of the document, the relationships among the information composing the document, the document semantics and the author’s information target, hence being able to manage both spatial relationships typical of diagrammatic languages, as well as relationships deriving from the arrangements of the content representations. Thus, in considering a document, WIL focusses on (1) the identification of the set of information units that compose it; (2) their graphic representation as shapes on the screen; (3) the document description; (4) the relationship among the information unit components; (5) the relationship between the shapes and the information units they represent; and (6) the relationship among the shapes.

Based on the previous considerations, the adopted methodology for designing the WIL system has taken into account the following principles: (1) the interaction language of the system exploits notations usually adopted in the considered domain, (2) the system presents all and only the tools necessary to perform the user work, and (3) the system presents layouts according to the traditional layout of the tools of the domain.

WIL provides an innovative solution to the needs of document production using editing, retrieving, aggregation and personalization functionalities; these needs are satisfied by considering interaction problems, which can be classified with respect to
  • usability of the system and of the communication language;

  • efficiency of software which constrains system usability;

  • sharing client and server tasks in order to have a good representation of the virtual world;

  • interactive and adaptive capabilities of the system;

  • system capability of supporting dynamic adaptation of the tool and data by the user.

A solution to these problems would allow users to index and annotate documents locally according to their specific culture, in compliance with the transmission protocols and the general recording standard. Following this approach, users can act according to their cultures, competences and habits, and recorded information can be presented according to the interaction languages of different users.

5 Web documents and Information Objects

In the WIL approach, e-documents are assumed to be presented with their components, called Information Objects (IOs), organized in such a way as to provide suggestions about the document’s semantics [3]. The retrieval activity is therefore performed by letting users indicate the IOs of the documents they are interested in, their relationships and their spatial location. The system has to identify similar documents according to a set of specified criteria.

Retrieving web documents is a complex activity due to the existence of different types of IO (title, images, etc.), heterogeneity of application domain (law, science, etc.), as well as heterogeneity of sources and user profiles. The definition of the document structure is the central issue to the XML approach [37].

XML considers documents and their components (IOs in WIL parlance) as typed elements. In particular, XML elements may be hierarchically structured, with each nested element described by a set of attributes. The structure of a set of XML documents is defined in a Document Type Definition (DTD) document, or with an XML Schema. The Document Object Model (DOM) for a document is a tree constructed in accordance with the typing information provided by the DTD for that document. An example is shown in Fig. 3.
Fig. 3

Example of the tree structure of a document

In the WIL approach, semantic relationships between IOs are detected according to their presentation, and independently from the language, using a set of heuristics [3]. A Web document is described using a set of IO types including: Document, Title, Paragraph, Table, List_of_elements, Image, Sound, Video, Animation, Interactive Frame. Each IO has usually a presentation on the screen and its description in the system. The link between presentation and description has to be explicitly expressed; therefore, each IO in the system is modeled as a cp. Complex IOs can be decomposed into simpler IOs. The whole document is considered as a complex IO.

Some IOs may be missing, as in an Untitled Paragraph, or the body can be a simple IO, so that a Chapter composed of Title and simple body is very similar to a Paragraph. A Title briefly describes the content of the body. Furthermore, Documents, Chapters and Paragraphs may be organized according to a hierarchy whose levels are identified by numbers and reflected in a similar hierarchy as the Titles. The way a Title is represented (e.g., its size and style) suggests its hierarchy level in the IO it is part of. The document constitutes the top of the hierarchy, and the IOs that characterize it are connected by the generalization, aggregation, classification and brother relationships. Their shapes are described by a set of structure attributes with pre, left, contain, and link operators (Fig. 4).
Fig. 4

Graphical representation of a document structure

Usually, if a shape precedes a second one, the contents of the two identified IOs can be (a) sequentially connected (brother), e.g., a paragraph followed by a second one or (b) a specification, explanation, refinement or generalization of the other (e.g., a title before a paragraph).

The types of IO previously introduced are specified in more details as follows:

Subject: identifies the main topic of the Document. Author: contains the name(s) of the responsible for the content of the work. Date: refers to the day when the document was published. Keywords: the set of keywords characterizing the content of the document.
  • Document: provides a meaning/information. Its IO component types may be documents and/or other types defined later on. Document attributes are
    1. a.

      Subject: identifies the main topic of the Document.

    2. b.

      Author: contains the name(s) of the responsible for the content of the work.

    3. c.

      Date: refers to the day when the document was published.

    4. d.

      Keywords: the set of keywords characterizing the content of the document.

  • Title: synthetically describes the content of an IO. A document can contain the main Title and other Titles describing its different parts. A Title contains Text.

  • Paragraph: a semantically autonomous component of the document. It can contain Text, Title (Titled Paragraph) and any other type of IO introduced later on. Paragraphs can be arbitrarily nested.

  • Note: shortly comments on or explains the information contained in a document (or part of it). It contains text.

  • Caption: explains an image, a table, a video, a movie or a video. It contains text.

  • List_of_elements: a complex element composed of a sequence of IOs.

  • Table: contains meta-data (data used for the description of table data) and data, respectively identified by fields and their content. It could contain a caption synthesizing the Table content, and notes, which provide more specific explanation.

  • Image: a type of IO associated with visual/graphical information, which represents parts or concepts of the real world. It could contain a caption, synthesizing the Image content.

  • Video: a sequence of images, possibly containing sound messages, usually acquired by a camera. An indexing of the information in it could be used to support the information retrieval activity. Among these are Title of Video, Location of the file used in replacement of Video (if it is not available), alternative Text to substitute Video and file when neither is available, Caption (possibly empty).

  • Animation: differs from Video in that images are simpler, less realistic images, human artifacts created or converted in digital form.

  • Interactive Frame: used for data input in a database.

Each IO is materialized on the screen as a cs. The description of the IO is provided by an attributed symbol of the form ty(id, TL, BR, S1, …, Sn), where
  • ty is the Type of IO.

  • id is a unique identifier for the symbol.

  • TL and BR are coordinate pairs identifying the enclosing rectangle for the IO representation.

  • S1, …, Sn is a sequence of attributes depending on the type ty. Typically, they contain data that can be exploited for a conventional cataloguing of the IO (e.g., author, title, content, keyword, date, etc.).

A particular attribute, cmp, describes the possible decomposition of the IO at hand. It is assigned a null value if the IO is considered a simple one or if its further decomposition as a complex IO is considered irrelevant. cmp is an expression composed of symbols connected by means of the operators and the relations introduced in the following.

The relation between the attributed symbols describing the IOs and the characteristic structures that represent them is expressed through the two functions int and mat, thus completing the characterization of an e-document as a visual sentence.

Each IO structuring a Web document can be connected to a set of operators (positional and connection operators). Operators are introduced and described in [3]. Each operator (e.g., cont ≡ contains; prec ≡ precedes) is associated with one or more semantic relationships. The introduced semantic relationships are: generalization, aggregation, classification and brother. Generalization, aggregation, and classification establish an order between a couple of operands. This order implies that the commutative property is not valid for these relations. The brother relationship is instead commutative.

Given the cs of a Web document, its interpretation identifies the type of IOs composing this document. For example, if a Document π1 which contains a Titled Paragraph π2 composed of a Title τ followed by an Untitled Paragraph π3, their characteristic structures cs are described respectively by the formulae:
$$ \begin{gathered} {\text{dcs}}_{{{{\uppi}}_{1}}} = T_{{{{\uppi}}_{1}}} ({\text{id}}_{{{{\uppi}}_{1}}} ,(x_{1} ,y_{1}),(x_{2} ,y_{2}),\;{\text{cmp}} = {\text{cmp}}_{{{{\uppi}}_{1}}} ,\;{\text{keywords = \{word}}_{ 1 1} {\text{\})}} \hfill \\ {\text{dcs}}_{{{{\uppi}}_{2}}} = T_{{{{\uppi}}_{2}}} ({\text{id}}_{{{{\uppi}}_{2}}} ,(x_{3} ,y_{3}),(x_{4} ,y_{4}),\;{\text{cmp}} = {\text{cmp}}_{{{{\uppi}}_{2}}} ,\;{\text{keywords = \{word}}_{ 2 1} , {\text{word}}_{ 2 2} \}) \hfill \\ {\text{dcs}}_{{{{\uppi}}_{3}}} = T_{{{{\uppi}}_{3}}} ({\text{id}}_{{{{\uppi}}_{3}}} ,(x_{5} ,y_{5}),(x_{6} ,y_{6}),\;{\text{keywords = \{word}}_{ 3 1} , {\text{word}}_{ 3 2} , {\text{ word}}_{ 3 3} {\text{\})}} \hfill \\ {\text{dcs}}_{{{\uptau}}} = T_{{{\uptau}}} ({\text{id}}_{{{\uptau}}} ,(x_{7} ,y_{7}),(x_{8} ,y_{8}),\;{\text{keywords = \{word}}_{ 4 1} {\text{\})}} \hfill \\ \end{gathered} $$
and the values of the cmp attribute are expressed as follows:
$$ ({\text{a}})\;{\text{cmp}}_{{{{\uppi}}_{1} }} = {{\uppi}}_{1} \,cont\,{\text{cmp}}_{{{{\uppi}}_{ 2} }} \quad ({\text{b}})\;{\text{cmp}}_{{{{\uppi}}_{ 2} }} = {{\uptau}}\,prec\,{{\uppi}}_{3} $$
so that
$$ {\text{dcs}}_{{{{\uppi}}_{1} }} = T_{{{{\uppi}}_{1} }} ({\text{id}}_{{{{\uppi}}_{1} }} ,(x_{1} ,y_{1} ),(x_{2} ,y_{2} ),\;{\text{cmp}} = {{\uppi}}_{1} ,\,cont\,{{\uptau}}\,prec\,{{\uppi}}_{3} ,\;{\text{keywords}} = \{ {\text{word}}_{11} \} ) $$
The extended expression for \( {\text{cmp}}_{{{{\uppi}}_{ 1} }} \) is represented as a binary tree in Fig. 5.
Fig. 5

Tree representation of the value of cmp for a complex IO

In general, the value of cmp is represented as a binary tree with the following features:
  • the root is associated with the = symbol,

  • the left successor of the root (leaf) is associated with an identifier for the whole,

  • each internal node is associated either with an operator or with an IO and its variables,

  • there exists an order between successors of internal nodes (the commutative property on the defined set of operands is not usually valid).

In order to support document description for its indexing, the variables contained in the attributed symbol, including the reference for the tree in cmp, have to be associated with leafs of the cmp tree (except for the left successor of the root) (Fig. 6).
Fig. 6

Extended tree representation of a cmp tree with emphasis on its variables

Because of the order between successors of internal nodes, tree inclusion between the target tree T and the document tree may be carried out according to the approach proposed in [38] for ordered included tree of tree T and used for retrieving.

6 Integrated management of multimedia resources

The introduction of continuous media, such as video, as Information Objects exploits a recent extension of the theory of visual sentences to multimedia sentences [39]. This extension raises some specific questions about the definition of their support, hence about their materialization and interpretation, as well as concerning the annotation task, a fundamental one in managing e-documents in WIL. For example, two dimensions alone cannot characterize the continuous “media”, for which considerations about time must be introduced, in particular as regards the relation between the continuous time of perception and the discrete time in which the computer governs the production of the representation. In the case of movies, the existence of two channels has to be taken into account to correctly determine if an annotation has to be referred to the audio or to the video channel.

In this sense, the notion of zone provides a uniform approach to the problem of the annotation. Let selection be a function that associates a value from the set {0, 1} to each element of the multidimensional support of a document. A zone of a document is isomorphic to a version of the document such that every element of the support, to which the selection function associates the 1 value, maintains the same value as in the original document, while every other element receives a value that constitutes the bottom element in the partial order naturally associated with the presentation channel (transparent for the images, silent for the audio, etc.).

The annotation of a document is therefore a set of sub-annotations, each associated with a set of zones (possibly a singleton), characterized by a set of meta-data that allow their identification, and possibly by some content.

Another problem connected with the annotation of Web information is the updating of the content between the time of annotation and the retrieving of the content. This problem can be due to
  • link obsolescence, i.e., to elimination or displacement of the original resources,

  • the dynamic nature of the document, e.g., a Web page produced as the result of a query, or the home page of a portal or its section,

  • the lack of versioning of the considered document.

In the rest of the paper, material from continuous media is not considered, mainly focusing on static documents containing text and/or images. In particular, documents and annotations are considered which are produced and managed inside an organization, including the cache memorization of copies for external annotated material, so that a correspondence between each one of the zone referred in the database containing annotations and zones of the original document is guaranteed. Consequently, it is proposed to store annotations that are represented by specific characteristic structures indicating the annotated zone, and whose associated description contains at least:
  • Uniform Resource Locator (URL) of the original document;

  • URL of the cache copy, absent if the document is internal to the organization;

  • the author of the annotation (the author is the owner of the client connection used for the annotation);

  • the annotation type;

  • the start time of the annotations creation;

  • the end time of the annotations creation;

  • the list of the component annotations and, for each of them, the list of the zones to which it is associated, each one represented by zone locator and indicator of the tool of the zone selection;

  • the annotation content.

The latter can be in the form of (a) an XML document which allows the annotation of material using some other multimedia material (i.e., vocal comment) for linking it to the files of interest; (b) a previously created file; or (c) a simple sequence of characters.

7 Personalizable definition of the annotations

The annotation management component of the WIL system integrates the MADCOW approach (Multimedia Annotation of Digital Content Over the Web) [39, 40] and the BANCO (Browsing Adaptive Network for Changing user Operativity) [41] approach to annotating any type of multimedia component available through a Web browser. Both approaches allow a user to select zones in the document according to their format and associate annotations to them, composed of text, other multimedia documents, or links to other material available on the Web. Annotations are transmitted in XML format (BANCO developed IM2L, a XML-based language, to describe document and annotation [41]) to a server, where their content is stored in a database. Annotations can be retrieved, starting from the original page or independently, according to several criteria, such as author, creation date, type of annotation or textual content, and presented on the browser as HTML pages in the MADCOW approach or as SVG documents in the BANCO approach. The annotations can then be annotated in turn.

MADCOW identifies a limited set of types of annotations, which can also be used for annotations retrieval. A first simple personalization of the interface for annotation management consists in allowing the choice of the icons for representation of the different types. Moreover, the interface can be personalized by means of the definition of the interaction tools for defining zones in the document (i.e., text selection by mouse dragging, selection of image areas by sketching simple boundaries such as ovals, rectangles, polylines). In principle, there is no constraint about the type of shapes that a user can decide to use. However, users inspecting an annotation have to be able to retrieve the shapes used by the author of the annotation.

The definition of relationships between different components of the document is interesting also for retrieving documents containing some particular characteristics (e.g., a document formatted using one column, two columns, etc.). In this case, the original information from the document representation has to be distinguished, and the set of the allowed spatial relationships defined, while the strategies adopted by the user for pointing or highlighting the presence of these relationships are personalizable.

The production modalities of annotations and interaction with the existing annotations, can be personalized by defining a finite set of actions (e.g., (de)selection, deletion, duplication), and action languages, combining elementary gestures.

The annotations connected to a same document are presented by materializing the current document with icons (see Fig. 7), associated with the types of the annotation, acting as placeholders for annotations relative to a given location in the document, and showing the actual annotations (and possibly filtering them) on demand (Fig. 8). However, this presentation is not personalizable. If more than one annotation is contained in the same zone, then the corresponding icons are located sequentially.
Fig. 7

A page with placeholders for annotations

Fig. 8

The visualization of an annotation over the annotated document

8 The WIL architecture for e-document management and manipulation

The architecture of the WIL system (Fig. 9) is structured in three different layers: Surface, Middleware and Data space. The surface contains the tools to define and exploit the languages enabling a user to visually interact with WIL, edit a document, formulate his/her interpretation of the document, and formulate a query to the system. Files and meta-data are produced and fed to the middleware, which uses them for updating the stored documents.
Fig. 9

WIL functional architecture: the document management component

WIL has been designed as a plastic visual system. The exploitation of the UIMS’s (User Interface Management System) facilities enables users to define their own working environment as an application workshop, to use and refine it.

Users exploit meta-information to visually specify the activity to be executed at the lower level: they define the information items to be accessed and processed, the programs, the necessary interactions for their execution, and the visualization mode of their results.

The design of the navigation structure of the application—translating users’ needs, habits, notations (as previously identified) into functional specifications for data organization, structuring contents, and paths—is put into operation by the middleware.

The surface is considered as an instrument for system observation and navigation through reliable paths. It is composed of Web documents, which may be passive, active, or dynamic and defined according to the communication requirements for the task.

Passive pages of text, graphics and hyperlinks are used for descriptive components, such as catalogue contents, table of contents and information for help online.

Active pages, which also include interactive graphic user-interface components, such as forms, are used to allow users to input data.

Dynamic pages change over time, modifying their contents and/or layout to correctly present a document while fixing its associated annotations.

Pages of all three types have been produced that propose arrangements (views on data) allowing users to focus on specific tasks and to organize data collected from the data space in any form (formatted text, tables, maps, graphics). The design of the surface specifies how information is presented to the users and how they can interact with it. This has implied decisions about the adoption of a user-centered style of interaction not limited to focus on user-friendliness, but always taking into account the guidelines for the usability of new web applications. Users interact with Web pages to collect data (formatted text, tables, maps, graphics) presented in meaningful arrangements that focus their attention on specific tasks.

The data space is the set of all the permanent data necessary for the specialist’s analysis (including the database), for describing the layouts for data presentation, the involved processes and the input, output, and state variables for their correct execution.

User–system interaction is performed through an interface that (1) is user friendly, aiming at offering easy interaction to general users who very often remain permanent novices with respect to the data accessed and the technologies used; (2) is user centered, referring to the user’s traditional paradigms and to his or her daily work metaphors when presenting data and offering tools for the organization, description, and presentation of documents and composing these metaphors into leitmotifs that mimic the user’s workplace, to facilitate browsing for the successful search of online information; and (3) provides navigation aids tailored to the application domain.

Interaction with WIL is performed by exploiting visual languages that formalize users’ traditional diagrammatic notations, in which verbal parts are intermingled with images, plots, diagrams and sketches.

The Indexing language specification feeds the Annotation editor with the primitives available for interactive indexing of the documents by content. A specification takes into consideration: the document fragments; their materialization; the description of the document and of its parts; the relationships among the fragments.

Figure 10 shows the Annotation editor: in the work area a working bench, the window containing the document (an image) and the tools to operate on it (in the toolbar on the top of the working bench) are displayed. A user can select the draw-line tool from the toolbar to draw a line for identifying a zone of interest (Fig. 10 depicts an opaque shield surrounded by a rectangle in the box in the bottom of the image). Then, the user selects the textual annotation tool (the “a” icon in the toolbar) and a pixel in the rounded zone: a pencil icon appears on it and the annotation manager (in Fig. 10, the window on the right of the working bench) is activated. The user can write her/his content in the annotation manager.
Fig. 10

Performing annotation on an image involves the selection of the set of pixels relative to a content written in the annotation manager

The Document and Annotation editor allows an existing/new document to be edited and/or annotated, i.e., visually indexed using the language defined by the designer by exploiting the module for Indexing language specification. A structural-semantic description of the document is produced which specifies the different fragments of the document, their spatial relationships, the used formats and the terms associated with these components, as could be derived from some domain ontology. This is integrated by a parsing activity which generates a Document Descriptor which is validated according to the WIL XML Schema and indexed. Queries can thus be performed both on the annotation content and in traditional ways.

The user accesses information by means of a query language, which allows the specification of both the content and the structure of the target document according to its semantics.

The user is able to define an HTML or XML document combining IOs coherently with the Indexing language and exploiting an editing language specified via the UIMS.

Furthermore, a HTML or XML document, accessed via Internet or Intranet, can be analyzed by the user who interprets (indexes) it by identifying the IO of the document and interacting through a visual language designed by means of the UIMS and coherent with the Indexing language. The Annotation editor (like the Document editor) is based on the indexer, that is the activator of the Indexing language and, consequently, it produces the indexing structure of the content (user’s interpretation constituted by the combination of fragments) and submits it to the Storage engine for its input in the Annotation archive.

The Query composer’s interface offers specific functionalities with an interaction environment similar to the editing environments. A query is expressed by means of a visual sentence which associates a document with its description (e.g., a query and a target document). In Fig. 11, the visual editor of the Query composer is showed. The user formulates its request by operating in the graphical working area to edit graphic objects which represent different types of fragments: s/he can select the possible IOs from the toolbar on the top of the visual editor and drag and drop them in the working area. Trough the tools specifying the relationships among the IOs, the user composes the IOs to define the graphical shape of the searched type of documents. A functionality of syntactic analysis verifies that the drawing is a graphic composition among the composition admissible in the system.
Fig. 11

The visual editor of the Query composer. The user is defining the components of a query language by developing icons for documents, tables and images

Queries can also be expressed in the form of sketches, in order to retrieve elements associated with the shape or icon represented as the one sketched by the user (Fig. 12).
Fig. 12

The Sketcher, the interface for producing sketches and retrieving associated images. The user has sketched the icon for a titled paragraph

In the visual editor, the user select some IOs from the icons in the toolbar; in the Sketcher (Fig. 12) the user draws a schematic representation of a set of IOs composing a document: the sketch is approximate and possibly ambiguous, but it is easy to use and allows an intuitive and incremental definition of the searched document.

When the user has composed or sketched the set of IOs of the document, s/he can submit the request. Different modules manage the user requests: the Query parser parses the syntax of the queries; the Search engine executes the requests and prepares the system replies; the Page composer processes them so that they are correctly presented to the user; the Query interpreter interprets the user requests and interacts with the Indexing language activator to determine the actions to be fired; the Query manager fires the actions connected to the query and activates the Search engine to start the search for documents satisfying the user request.

The authoring of a new document, the modification of an existing one, and all the operations performed by the user to reach a correct indexing of the document content are parsed by the Annotation manager, which creates an index structure to efficiently manage the documents’ content.

The Indexing language activator is able to read fragments or documents and to translate them in a form suited for the target agent.

The Annotation archive is the catalogue for accessing the documents. It is integrated with the structure used for finding by keyword.

9 Conclusions

This paper has presented the methodology and the functional architecture of the WIL system, a system that supports annotation, editing, indexing and retrieving of Web documents, and exploits the description of the logical structure of the documents to extract visual information. It is indeed considered that this description contains powerful suggestions as to the relevance and meaning of the parts of a document. To this aim, an e-document has been described as composed by organized Information Objects, whose spatial relationships express semantics related to a specific context of use. User culture is captured both by annotation and sketching tools: the former permit to capture specific terminology in communities of practice, the latter to identify secondary notations. In this way, e-documents can be indexed (and consequently retrieved) using controlled or free-text annotations and structural descriptions.

Future developments of WIL will address the extension of the document indexing methods to include the management of domain ontologies, which could allow a more uniform management of domain specific languages. Open source projects already exist which offer mechanisms for ontology management that could be easily integrated to fruitfully extend the modules for document indexing archiving and retrieval in WIL.

Copyright information

© Springer-Verlag 2009