High-throughput biological imaging uses automated imaging devices to collect a large number of microscopic images for analysis of biological systems and validation of scientific hypotheses. Efficient manipulation of these data sets for knowledge discovery requires high performance computational resources, efficient storage, and automated tools for extracting and sharing such knowledge among different research sites. Newly emerging grid technologies provide powerful means for exploiting the full potential of these imaging techniques. Efficient utilization of grid resources requires the development of knowledge-based tools and services that combine domain knowledge with analysis algorithms. In this paper we first investigate how grid infrastructure can facilitate high-throughput biological imaging research, and present an architecture for providing knowledge-based grid services for this field. We identify two levels of knowledge-based services. The first level services provide tools for extracting spatio-temporal knowledge from image sets and the second level provides high-level knowledge management and reasoning services. We then present cellular imaging markup language (CIML), an XML-based language for modeling of biological images and representation of spatio-temporal knowledge. This scheme can be used for spatio-temporal event composition, matching, and automated knowledge extraction and representation for large biological imaging data sets. We demonstrate the expressive power of this formalism by means of different examples and experimental results.

Date of this Version

February 2007