Rank: Newbie
Groups: Registered
Joined: 4/7/2018(UTC) Posts: 5 Location: Santo Antonio da Patrulha Thanks: 1 times
|
Hi
is it possible to optimize the size of a file converted to pdf searchable after converting from tif?
tks
|
|
|
|
Rank: Administration
Groups: Administrators
Joined: 1/5/2016(UTC) Posts: 1,102
Thanks: 7 times Was thanked: 128 time(s) in 125 post(s)
|
Hi, I think it is possible. There is no any method called OptimizePdfAfterOCR, but you may write simple program to reduce size of images in the searchable PDF. for example the PDF document generated from the multipage tiff on this page https://tesseract.pataga...e6-b6e9-9df777c4678c.htmmay be reduced from 1Mb to 500Kb using following code Code:
static PdfBitmap RescaleImage(PdfBitmap src, int newWidth, int newHeight)
{
PdfBitmap bitmap = new PdfBitmap(newWidth, newHeight, true);
using (var g = Graphics.FromImage(bitmap.Image))
{
g.DrawImage(src.Image, 0, 0, newWidth, newHeight);
}
return bitmap;
}
static void Main(string[] args)
{
PdfCommon.Initialize();
using (var doc = PdfDocument.Load(@"d:\0\multipage.pdf"))
{
foreach (var page in doc.Pages)
{
foreach(var img in page.PageObjects)
{
if (!(img is PdfImageObject))
continue;
var imageObject = (img as PdfImageObject);
//Rescale found image to reduce its size
var newBitmap = RescaleImage(imageObject.Bitmap, imageObject.Bitmap.Width / 2, imageObject.Bitmap.Height / 2);
//Replace old bitmap with reduced one.
imageObject.Bitmap = newBitmap;
//Generate image stream into page resources. This stream will be wrote to page resource dictionary under FXX1 key
Pdfium.FPDFImageObj_GenerateStream(imageObject.Handle, page.Handle);
//Replace old image stream in the page resources with newly created stream
var xObject = page.Dictionary["Resources"].As<PdfTypeDictionary>()["XObject"].As<PdfTypeDictionary>();
xObject["Im1"] = xObject["FXX1"].Clone();
xObject.Remove("FXX1");
imageObject.Dispose();
}
}
doc.Save(@"d:\0\multipage_rescaled.pdf", Patagames.Pdf.Enums.SaveFlags.NoIncremental);
}
}
This code uses Pdfium.Net SDK to get access to PDF internals dictionaries. You can download it here: https://pdfium.patagames.com/downloads/Edited by user Sunday, April 8, 2018 5:42:03 AM(UTC)
| Reason: Not specified
|
1 user thanked Paul Rayman for this useful post.
|
|
|
Rank: Newbie
Groups: Registered
Joined: 4/7/2018(UTC) Posts: 5 Location: Santo Antonio da Patrulha Thanks: 1 times
|
Hi, Paul Thank you very much for the quick return. I will check the possibility of using the PDFium
|
|
|
|
Forum Jump
You cannot post new topics in this forum.
You cannot reply to topics in this forum.
You cannot delete your posts in this forum.
You cannot edit your posts in this forum.
You cannot create polls in this forum.
You cannot vote in polls in this forum.
Important Information:
The Patagames Software Support Forum uses cookies. By continuing to browse this site, you are agreeing to our use of cookies.
More Details
Close