logo
Welcome Guest! To enable all features please Login or Register.

Notification

Icon
Error

Options
Go to last post Go to first unread
gafforelli  
#1 Posted : Saturday, April 7, 2018 7:30:05 PM(UTC)
gafforelli

Rank: Newbie

Groups: Registered
Joined: 4/7/2018(UTC)
Posts: 5
Brazil
Location: Santo Antonio da Patrulha

Thanks: 1 times
Hi

is it possible to optimize the size of a file converted to pdf searchable after converting from tif?

tks
Paul Rayman  
#2 Posted : Sunday, April 8, 2018 5:34:57 AM(UTC)
Paul Rayman

Rank: Administration

Groups: Administrators
Joined: 1/5/2016(UTC)
Posts: 1,004

Thanks: 5 times
Was thanked: 121 time(s) in 118 post(s)
Hi,

I think it is possible.
There is no any method called OptimizePdfAfterOCR, but you may write simple program to reduce size of images in the searchable PDF.

for example the PDF document generated from the multipage tiff on this page https://tesseract.pataga...e6-b6e9-9df777c4678c.htm
may be reduced from 1Mb to 500Kb using following code

Code:

static PdfBitmap RescaleImage(PdfBitmap src, int newWidth, int newHeight)
{
    PdfBitmap bitmap = new PdfBitmap(newWidth, newHeight, true);
    using (var g = Graphics.FromImage(bitmap.Image))
    {
        g.DrawImage(src.Image, 0, 0, newWidth, newHeight);
    }
    return bitmap;
}

static void Main(string[] args)
{
    PdfCommon.Initialize();
    using (var doc = PdfDocument.Load(@"d:\0\multipage.pdf"))
    {
        foreach (var page in doc.Pages)
        {
            foreach(var img in page.PageObjects)
            {
                if (!(img is PdfImageObject))
                    continue;
                var imageObject = (img as PdfImageObject);
                //Rescale found image to reduce its size
                var newBitmap = RescaleImage(imageObject.Bitmap, imageObject.Bitmap.Width / 2, imageObject.Bitmap.Height / 2);
                //Replace old bitmap with reduced one.
                imageObject.Bitmap = newBitmap;
                //Generate image stream into page resources. This stream will be wrote to page resource dictionary under FXX1 key
                Pdfium.FPDFImageObj_GenerateStream(imageObject.Handle, page.Handle);

                //Replace old image stream in the page resources with newly created stream
                var xObject = page.Dictionary["Resources"].As<PdfTypeDictionary>()["XObject"].As<PdfTypeDictionary>();
                xObject["Im1"] = xObject["FXX1"].Clone();
                xObject.Remove("FXX1");
                imageObject.Dispose();
            }
        }
        doc.Save(@"d:\0\multipage_rescaled.pdf", Patagames.Pdf.Enums.SaveFlags.NoIncremental);
    }
}


This code uses Pdfium.Net SDK to get access to PDF internals dictionaries. You can download it here:
https://pdfium.patagames.com/downloads/

Edited by user Sunday, April 8, 2018 5:42:03 AM(UTC)  | Reason: Not specified

thanks 1 user thanked Paul Rayman for this useful post.
gafforelli on 4/12/2018(UTC)
gafforelli  
#3 Posted : Thursday, April 12, 2018 1:55:16 AM(UTC)
gafforelli

Rank: Newbie

Groups: Registered
Joined: 4/7/2018(UTC)
Posts: 5
Brazil
Location: Santo Antonio da Patrulha

Thanks: 1 times

Hi, Paul
Thank you very much for the quick return.
I will check the possibility of using the PDFium
Users browsing this topic
Forum Jump  
You cannot post new topics in this forum.
You cannot reply to topics in this forum.
You cannot delete your posts in this forum.
You cannot edit your posts in this forum.
You cannot create polls in this forum.
You cannot vote in polls in this forum.