logo
Welcome Guest! To enable all features please Login or Register.

Notification

Icon
Error

Options
Go to last post Go to first unread
adammckay  
#1 Posted : Wednesday, June 29, 2022 12:08:43 PM(UTC)
adammckay

Rank: Newbie

Groups: Registered
Joined: 6/22/2022(UTC)
Posts: 9
United States
Location: Pennsylvania

I am using the code straight from the Patagames homepage along with DLL version 4.66.2704 and can confirm that it will cause a memory leak with a large enough PDF. I can even comment out the filling of the text object and still a leak will occur. Is there a different DLL that I can use or perhaps a workaround in code?


Code:
using (var doc = PdfDocument.Load(@"c:\test001.pdf")) // C# Read PDF File
    {
        foreach (var page in doc.Pages)
        {
            //Gets number of characters in a page or -1 for error.
            //Generated characters, like additional space characters, new line characters, are also counted.
            int totalCharCount = page.Text.CountChars;

            //Extract text from page to the string
            //Even with the next line commented out, I'll get a leak.
            //string text = page.Text.GetText(0, totalCharCount);

            page.Dispose();
        }
    }
Paul Rayman  
#2 Posted : Friday, July 15, 2022 7:42:36 AM(UTC)
Paul Rayman

Rank: Administration

Groups: Administrators
Joined: 1/5/2016(UTC)
Posts: 1,035

Thanks: 5 times
Was thanked: 122 time(s) in 119 post(s)
The memory will be freed when the document is disposed. This is not a memory leak. This is a regular consumption when the parser is running. Some parser data is stored in the document object to optimize page loading speed.
In addition, there are objects shared by some pages. They are also stored in the document scope.
adammckay  
#3 Posted : Friday, July 15, 2022 9:00:58 AM(UTC)
adammckay

Rank: Newbie

Groups: Registered
Joined: 6/22/2022(UTC)
Posts: 9
United States
Location: Pennsylvania

I see what you are saying. Unfortunately, we do receive a LOT of PDFs with high page counts and just looping through pages causes it to run out of memory.

Our solution (not an elegant one) is that when we do have to do page looping, we dispose/reload the document every 500 pages. I wish I could send the triggering document but unfortunately, it is not allowed to be sent out.
Paul Rayman  
#4 Posted : Saturday, July 16, 2022 11:46:04 AM(UTC)
Paul Rayman

Rank: Administration

Groups: Administrators
Joined: 1/5/2016(UTC)
Posts: 1,035

Thanks: 5 times
Was thanked: 122 time(s) in 119 post(s)
perhaps this case is worth investigating.
Can you send a test console application and one of these documents to support@patagames.com?
adammckay  
#5 Posted : Monday, July 18, 2022 1:20:08 PM(UTC)
adammckay

Rank: Newbie

Groups: Registered
Joined: 6/22/2022(UTC)
Posts: 9
United States
Location: Pennsylvania

Here is what I found. I simply used the example from the Patagames website to create the problem. (see Read PDF File and Extract Text From it in C# on Patagames site)

I am unable to send the PDF that initially triggered the issue because it is sensitive to our organization.

In attempting to create a triggering document, I did find a partial solution. My Visual Studio solution was set to 'Prefer 32 bit' in Compile Properties. That means it was incapable of holding more than 4GB of memory. Once I force the application to run 64 bits, the problem went away (though the PDF did consume about 6GB of memory). Our solution going forward will probably be to try to handle memory issues because something using more memory than available on our server should be extremely rare/unlikely.
Users browsing this topic
Forum Jump  
You cannot post new topics in this forum.
You cannot reply to topics in this forum.
You cannot delete your posts in this forum.
You cannot edit your posts in this forum.
You cannot create polls in this forum.
You cannot vote in polls in this forum.