logo
Welcome Guest! To enable all features please Login or Register.

Notification

Icon
Error

Options
Go to last post Go to first unread
rmbarbosa  
#1 Posted : Sunday, May 19, 2019 11:13:38 PM(UTC)
rmbarbosa

Rank: Newbie

Groups: Registered
Joined: 5/19/2019(UTC)
Posts: 2
Portugal

This code is generating memory leaks when the pdf contains image objects.
It does not generate those leaks if the page is rendered.
I've used a pdf file with only one image embbedded in each page (doc has 20 pages=20 images embbedded) to verify it.
as you can see i try 30 times... and the memory starts to gradually increase.

i managed to track it down to
Patagames.Pdf!Patagames.Pdf.Pdfium.FPDFImageObj_GetCloneBitmap_native( IntPtr )
but i can't go more into native.

it seems there is an handle not beeing properly disposed.
I've highlighted the core zone where the leak happens....
i've checked in the visual sudio memory profiler and confirmed in other profiler and it is having leaks...

Please can you verify?
i'm very interested in buying but this is a problem to me.

Code:
static void Main(string[] args)
        {

            PdfCommon.Initialize();
            var pdfFilename = @"C:\............\doc20120514150242.pdf";
            for (int i = 0; i < 30; i++)
            {
                PdfToBitmap(pdfFilename);
                if (i % 5 == 0)
                {
                    Console.WriteLine($"Press any key to continue.");
                    Console.ReadLine();
                }
                   
            }


            Console.WriteLine($"Completed.");
            Console.ReadLine();
        }

        private static void PdfToBitmap(string pdfFilename)
        {
            Console.WriteLine($"Extracting images from pdf file: {pdfFilename}");

            var tfolder = @"C:\mytempfolder\pdfs";
            if (!Directory.Exists(tfolder))
                Directory.CreateDirectory(tfolder);

            /// <summary>
            /// Render whole PDF document using C# PDF Library
            /// </summary>
            using (var doc = PdfDocument.Load(pdfFilename)) // C# Read PDF Document
            {
                var pageIndex = 0;
                foreach (var page in doc.Pages)
                {
                    var dt = DateTime.Now;

                    var tfile = Path.Combine(tfolder, $"{Path.GetFileNameWithoutExtension(pdfFilename)}_{pageIndex}.png");

                    var imgObjects = ExtractImagesFromPage(page);

                    var text = page.Text;
     
                    var renderPage = true;
                    if (imgObjects.Count == 1 && text.CountChars == 0)
                    {
                        [h]var pdfBitmap = imgObjects[0].GetBitmap();

                        pdfBitmap.Image.Save(tfile, ImageFormat.Png);

                        pdfBitmap.Dispose();[/h]
                       
                        renderPage = false;
                    }

                    foreach (var item in imgObjects)
                        item.Dispose();


                    if (renderPage)
                    {
                        int width = (int)(page.Width / 72.0 * 300);
                        int height = (int)(page.Height / 72.0 * 300);
                        using (var bitmap = new PdfBitmap(width, height, true))
                        {
                            bitmap.FillRect(0, 0, width, height, FS_COLOR.White);
                            page.Render(bitmap, 0, 0, width, height, PageRotate.Normal, RenderFlags.FPDF_LCD_TEXT);

                            bitmap.Image.Save(tfile, ImageFormat.Png);
                        }
                    }

                    var msecs = (DateTime.Now - dt).TotalMilliseconds;
                    Console.WriteLine($"Page {pageIndex} Took {msecs} msecs");
                    pageIndex++;

                    page.Dispose();
                }

                doc.Dispose();
            }
        }

        private static List<PdfImageObject> ExtractImagesFromPage(PdfPage page)
        {
            var imageObjects = new List<PdfImageObject>();
            //Enumerate all objects on a page
            foreach (var obj in page.PageObjects)
            {
                var imageObject = obj as PdfImageObject;
                if (imageObject == null)
                {
                    imageObject.Dispose();
                    continue; //if not an image object then nothing do
                }
                imageObjects.Add(obj as PdfImageObject);
            }

            return imageObjects;
        }

Edited by user Sunday, May 19, 2019 11:20:36 PM(UTC)  | Reason: Not specified

Paul Rayman  
#2 Posted : Monday, May 20, 2019 8:32:12 PM(UTC)
Paul Rayman

Rank: Administration

Groups: Administrators
Joined: 1/5/2016(UTC)
Posts: 743

Thanks: 1 times
Was thanked: 90 time(s) in 89 post(s)
I checked your code and found that it works fine, except the following place
Code:
                var imageObject = obj as PdfImageObject;
                if (imageObject == null)
                {
                    imageObject.Dispose();
                    continue; //if not an image object then nothing do
                }


imageObjet is null, so I just remove the disposing.

There is no memory leak. My result is
memleak01.png (31kb) downloaded 1 time(s).


Could you please provide console app where this issue is reproduced.
PDF document will also be appropriate.

Edited by user Monday, May 20, 2019 8:33:37 PM(UTC)  | Reason: Not specified

rmbarbosa  
#3 Posted : Tuesday, May 21, 2019 9:17:52 AM(UTC)
rmbarbosa

Rank: Newbie

Groups: Registered
Joined: 5/19/2019(UTC)
Posts: 2
Portugal

Thank you for the reply.
Indeed there is a bug that you pointed out...
but thats not the problem.

i'm sending you some screen-shots and a demo pdf.

I start with the first run of extracting the images to a fixed folder

as you can see in the attached files,
the first image you see after completing 4 saves and exiting the document processing the memory usage is 37MB.

p1.jpg (85kb) downloaded 4 time(s).

Then i run the same process more 5 times and stop again... the memory usage is now 42MB.

p2.jpg (121kb) downloaded 3 time(s).

if i let it run all the way to the end... after 30 executions the memory usage will be 55MB.

p3.jpg (125kb) downloaded 4 time(s).

the tendendy is UP.

so there is a memory leak, because i dispose the document after each processing so the memory should be stable.

You can see it clearly in the memory usage of the heap after taking memory snapshots at each point that i've mentioned.


p4.jpg (88kb) downloaded 0 time(s).

More over...
If you coment the code from lines 49 to 55 and run the same process you will notice that the memory is stable as it should be.

I'm using Microsoft Visual Studio Enterprise 2017
Version 15.9.11
VisualStudio.15.Release/15.9.11+28307.586
Microsoft .NET Framework
Version 4.7.03056
Installed Version: Enterprise


The version of your dll that i'm using as a nuget is the latest: v4.7.2704

i send you attached some screenshots and a pdf doc sample that i've used to reproduce this problem.
also there is a zip of my bare code console with a vs project.

Doc1.pdf (794kb) downloaded 2 time(s).
testPdfiumConsoleApp.zip (5kb) downloaded 1 time(s).

Edited by user Tuesday, May 21, 2019 9:35:22 AM(UTC)  | Reason: Not specified

Paul Rayman  
#4 Posted : Tuesday, June 4, 2019 10:54:30 AM(UTC)
Paul Rayman

Rank: Administration

Groups: Administrators
Joined: 1/5/2016(UTC)
Posts: 743

Thanks: 1 times
Was thanked: 90 time(s) in 89 post(s)
Thank you for detailed report.
I passed this issue to the core dev team.
We will release the fix soon.
Paul Rayman  
#5 Posted : Saturday, June 15, 2019 8:07:25 AM(UTC)
Paul Rayman

Rank: Administration

Groups: Administrators
Joined: 1/5/2016(UTC)
Posts: 743

Thanks: 1 times
Was thanked: 90 time(s) in 89 post(s)
Users browsing this topic
Forum Jump  
You cannot post new topics in this forum.
You cannot reply to topics in this forum.
You cannot delete your posts in this forum.
You cannot edit your posts in this forum.
You cannot create polls in this forum.
You cannot vote in polls in this forum.