Leveraging text content for management of construction project documents

Mohammed Alqady, Purdue University

Abstract

The construction industry is a knowledge intensive industry. Thousands of documents are generated by construction projects. Documents, as information carriers, must be managed effectively to ensure successful project management. The fact that a single project can produce thousands of documents and that a lot of the documents are generated in a textual/unstructured format greatly complicates the task of information management. Conventionally, project documents are organized based on classifying documents according to fixed/predefined classes and document metadata, e.g. according to document type, originator, project attribute, specification division, date, etc. While such classification method is easy to implement, it is only advantageous for document search and retrieval if the document seeker has prior knowledge of the content of the document corpus. In many cases and for various project management activities this is not the case, resulting in frustration of the search task with delayed or incomplete search results. An alternative framework for organizing project documents based on document content is proposed. The framework takes into account important characteristics of construction project documents and leverages such characteristics to facilitate document search and retrieval. The premise for the framework is the fact that documents are not produced haphazardly, but are generated as a result of certain events or circumstances occurring in the project. As such documents can be linked to each other on the semantic level; a point that is overlooked by document management systems which generally manage documents in vacuo by disregarding or failing to utilize such semantic connections between the documents. Organizing project documents based on the semantic relations that exist between them (revealed from the document content and not just the document attributes) facilitates information retrieval and retains the knowledge of the actual project participants, thereby supporting knowledge reuse. Another aspect of the thesis investigates the use of document content analysis to enable automated document management. If textual similarities between documents correlate with what human users recognize through their semantic abilities, then content analysis of documents can be used to automatically organize documents according to the proposed framework. Text classifiers based on machine learning techniques were evaluated to determine their performance in identifying which group of semantically-similar documents a test document belongs. Also, an unsupervised learning method was adapted and evaluated for the task of clustering documents based on textual similarity into sets of documents that are semantically related. The purpose of such evaluations is to equip electronic document management systems with content analysis capabilities that facilitate document search and retrieval.

Degree

Ph.D.

Advisors

Kandil, Purdue University.

Subject Area

Information Technology|Civil engineering

Off-Campus Purdue Users:
To access this dissertation, please log in to our
proxy server
.

Share

COinS