The full text of this article
Reengineering PDF-based documents targeting complex software specifications
by Mehrdad Nojoumian; Timothy C. Lethbridge
International Journal of Knowledge and Web Intelligence (IJKWI), Vol. 2, No. 4, 2011
Abstract: We discuss how to reengineer complex PDF-based documents, such as specifications and technical books, so that end users have a better experience with them. Specifications of the object management group (OMG) are our initial targets. Such specifications are dense and intricate to use, and tend to have complicated structures. Our approach includes format conversion, logical structure extraction, text extraction and multi-layer hypertext generation. Logical structure extraction is central, and results in an XML document with a schema tailored to the type of document. Many key concepts of a document are expressed in this schema, including concepts extracted from the patterns of words used in headings. For example in OMG specifications, package relationships and class associations can often be extracted from the wording of headings. When we produce, in the final step, a multilayer hypertext version of the document, these extracted concepts allow a richer user experience.
Online publication date: Thu, 26-Jan-2012
is only available to individual subscribers or to users at subscribing institutions.
Go to Inderscience Online Journals to access the Full Text of this article.
Pay per view:
If you are not a subscriber and you just want to read the full contents of this article, buy online access here.
Complimentary Subscribers, Editors or Members of the Editorial Board of the International Journal of Knowledge and Web Intelligence (IJKWI):
Login with your Inderscience username and password:
Want to subscribe?
A subscription gives you complete access to all articles in the current issue, as well as to all articles in the previous three years (where applicable).
See our Orders page to subscribe.
If you still need assistance, please email email@example.com