Information hiding in printed documents

Aravind K Mikkilineni, Purdue University

Abstract

In today's digital world securing different forms of content is very important in terms of protecting copyright and verifying authenticity. One example is watermarking of digital audio and images. We believe that a marking scheme analogous to digital watermarking but for documents is very important. There currently exist techniques to secure documents such as bank notes using paper watermarks, security fibers, holograms, or special inks. There are a number of applications in which it is desirable to be able to identify the technology, manufacturer, model, or specific unit that was used to print a given document even if the printer in question does not make use of these existing security devices to explicitly identify itself. It would be useful to achieve the same or a better level of protection without the use of any additional devices or technologies. Two strategies are proposed for printer identification based upon examination of a printed document. The first strategy is passive. It involves characterization of the printer by finding features in the printed document that are intrinsic to that particular printer, model, or manufacturer's products. The second strategy is active. It involves the embedding of an extrinsic signature into a printed page. This signature can be generated by modulating the process parameters of the printer mechanism to encode identifying information such as the printer serial number and date of printing. It is shown that good separation between printers is achievable using gray-level co-occurrence based texture features obtained from text documents. Experiments using ten printers and a support vector machine classifier show very low classification error even between printers with the same electromechanical structure. The technique is also shown to work for various font sizes, font types, paper types, and printer age. The features are observed to migrate with the age of the consumables indicating that it may be possible to estimate the age of the consumables at the time of printing. In addition, the intrinsic nature of the features makes it difficult to obscure or remove them without physically modifying the printer itself. Combining both texture features and banding features it is possible to identify a printer under several attack scenarios. A coding technique for embedding extrinsic signatures in text documents is presented. Both time and frequency domain signaling and detection schemes are investigated. It is shown that better performance is achieved using a time domain signaling scheme with a correlation detector due to the limited length of text character edges. It is also shown that by treating the document as a communication channel, a coding technique allowing approximately 3600 bits in a full page of 12 point text is achievable with a 7.74% bit error rate. By using the data hiding technique described above, a counterfeit and tamper detection method based on combinatorial group testing is developed and investigated. The low error rate achievable by the data hiding system allows reliable determination of document authenticity and the location of tampered data within a document. From results of previous work a printer dot model is proposed to simulate the printing of cluster-dot halftone patterns. It has been shown that the original parameters chosen for that model do not adequately represent vertical edges in saturated regions such as text. Estimating the parameters by minimizing the error between the simulated and experimental edge profiles and edge sharpness for both the left and right edges provides values that more accurately represent the actual edge with and without embedded signals.

Degree

Ph.D.

Advisors

Delp, Purdue University.

Subject Area

Electrical engineering|Mechanical engineering|Computer science

Off-Campus Purdue Users:
To access this dissertation, please log in to our
proxy server
.

Share

COinS