Document Tampering Detection

1. With only 274 patches to train a deep learning setup is unable to learn features for document tampering detection. I have tried multiple setups such as different sampling rate, different architeures, different patch sizes. See Experiments Section
2. Data augmention and domain adaptation similiar to Yashas Setup for Image tampering increases the performance of the model but it is not sufficient to be used for detection.
3. Combination of non deep learning methods (CMFD, Jpeg Artifacts and Splicebuster ) seems to be performing better than the current setup.
4. A better statergy to create synthethic document tampering should boost performance.

Method

Figure shows the setup for document tampering detection

Text Detection

First step is to detect the text in the image. For this I use ctpn-textdetector to extract text from the documents. To extract characters from the text, I use connectedcomponents.

Creating Patches

The input to the model are 64x64 patches. To extract patches from the image. We use the bounding boxes previously extracted . Then 64x64 patches are created by extracting the region around the text. This becomes the input to our model.

Class Imbalance

As a document generally contains large amount of text out of which only a few regions are tampered. The data becomes highly imbalanced. In the find-it dataset I noticed that on average only 3.0% of patches extracted from a document were tampered.

Experiments

Changing Sampling Ratio

As there are very less tampered patches compared to non tampered patches. The first experiment is to see the effect of sampling rate of different classes. Accordingly changing the loss function. Hence after creating patches based as mentioned above. Results on 3 different sampling rate are shown below.

Sampling Ratio( b/w tampered & non-tampered class )	Validation Patch Accuracy	Validation F1-score
1-1	0.657	0.603
1-5	0.680	0.624
1-10	0.664	0.608

Amount of data augmentation

As tampered images are so low. It makes it very hard to train a CNN architecture with such data. As shown by Yashas data augmentation improves the performance of deep learning models on image tampering classifiation.

Synthetic Data creation

We create synthetic tampered images using 3 types of tampering.

1. Copy-paste or CPI
2. Splicing or CPO
3. In-painting or CUT

Examples of augmented images created.

Copy-Paste

Splicing

Inpainting

Find-it dataset provides 470 pristine images for classification. To create tampered images we first use a ctpn-textdetector [3] to extract text from the documents. To extract characters from the text, we use connectedcomponents. Finally different types of tampering are created by replacing the text region.Thus, we create total of 6000 synthetic images.

Domain Adapatation

As seen in the figure below during training source model and target model are trained simultaneously. The weights of the CNN block between them are shared. Next they are passed through 2 fully connected layers, to get a 256-dim representation for source and target images. For domain adapation Yashas defines an MMD loss between the 2 representations. The MMD loss defined as:

where xs and xt are features for the source and target images respectively. Φ(xs) and Φ(xt) are the calculated by passing the representation through Gaussian kernel

Number of synthetic patches	Source Training F1-score	Source Validation F1-score	Target Training data F1-score	Target Validation F1-score
No synthetic data	-	-	0.680	0.624
2000 Patches	0.968	0.951	0.962	0.734
6000	0.980	0.971	0.973	0.782

Loss & Accuracy plots vs epochs

Training & Validation loss with and without using augmented data
Training and Validation accuracy with and without using augmented data

Validation accuracy on the find-it images is shown by pink color.

Changing CNN architecture

We futher tested whether changing the architecture would improve the performance. From the table below we can infer that resnet type model are performing slightly better than VGG setup used by Yashas

Number of Convolution operations	Source Training F1-score	Source Validation F1-score	Target Training data F1-score	Target Validation F1-score
3 block	0.902	0.898	0.87	0.702
5 blocks (Yashas Setup)	0.973	0.961	0.953	0.762
18 block (Resnet Setup)	0.980	0.971	0.973	0.782

We notice a marginal improvement by changing the architecture.

Testing

For testing we have taken the model with the highest average IOU over the validation data. We compare it with the setup used by Verdolivia. They use a 3 techniques (a) CMFD for CPI, (b) Splicebuster for CPO and (c) Jpeg Artifacts. I have taken the union of all 3 masks and compared it with the ground truth to evaluate with their results.