Title: Automatic ontology generation from patents using a pre-built library, WordNet and a class-based n-gram model

Authors: Zhen Li; Derrick Tate

Addresses: Department of Mechanical Engineering, Texas Tech University, Lubbock, Texas 79409, USA ' Department of Industrial Design, Xi'an Jiaotong-Liverpool University, Suzhou, China

Abstract: An ontology is defined as a structured, hierarchical way for describing domain knowledge. Research work regarding ontological engineering has yielded fruitful results, but these methods share a common drawback: they require significant manual work to generate an ontology, which limits the usefulness of these approaches in practice. In this paper, we propose a computational model that combines data mining, Natural Language Processing (NLP), WordNet and a novel class-based n-gram model for automatic ontology discovery and recognition from existing patent documents. A pre-built ontology library was constructed by gathering knowledge from engineering textbooks and dictionaries. Then a data set of engineering patent claims was split into training (80%) and validation (20%) subsets. The pre-built library and WordNet were used to generate class labels for constructing class-based n-gram models in a training process. The holdout validation showed that the average accuracy was 87.26% for all validation samples.

Keywords: ontological engineering; n-gram language models; natural language processing; NLP; ontology generation; patents; computational models; data mining; automatic ontology discovery; ontology recognition; ontology library.

DOI: 10.1504/IJPD.2015.068965

International Journal of Product Development, 2015 Vol.20 No.2, pp.142 - 172

Received: 10 Apr 2013
Accepted: 17 May 2014

Published online: 22 Apr 2015 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article