Patagames Software Support Forum
»
Pdfium.Net SDK
»
FAQ
»
How to search for a text in a PDF file and return the coordinates if the text exist?
Rank: Administration
Groups: Administrators
Joined: 1/5/2016(UTC) Posts: 1,075
Thanks: 6 times Was thanked: 124 time(s) in 121 post(s)
|
Question:I am trying to search for a text in a pdf file and return the coordinates if the text exist. I was researching the net and find out that can be done with the Pdfium.Net SDK. Could you please provide some examples of how to do that? Answer:Please look at code below Code:
//Open PDF document
using (var doc = PdfDocument.Load(@"d:\0\test_big.pdf"))
{
//Enumerate pages
foreach(var page in doc.Pages)
{
var found = page.Text.Find("text for search", FindFlags.None, 0);
if (found != null)
{
do
{
var textInfo = found.FindedText;
foreach(var rect in textInfo.Rects)
{
float x = rect.left;
float y = rect.top;
//...
}
} while (found.FindNext());
}
page.Dispose();
}
}
Also you can use PdfSearch class for asynchronous search Code:
//Open PDF document
var doc = PdfDocument.Load(@"d:\0\test_big.pdf");
PdfSearch search = new PdfSearch(doc);
search.FoundTextAdded += (s, e) =>
{
var textInfo = doc.Pages[e.FoundText.PageIndex].Text.GetTextInfo(e.FoundText.CharIndex, e.FoundText.CharsCount);
foreach (var rect in textInfo.Rects)
{
float x = rect.left;
float y = rect.top;
Console.WriteLine(string.Format("Found text: {0}, Page = {1}, x= {2}, y={3}", textInfo.Text, e.FoundText.PageIndex, x, y));
//...
}
};
search.SearchCompleted += (s, e) =>
{
doc.Dispose();
};
search.SearchProgressChanged += (s, e) =>
{
Console.WriteLine(string.Format("Progress: {0}%", e.ProgressPercentage));
};
search.Start("document", FindFlags.MatchWholeWord);
Console.ReadLine();
Edited by user Wednesday, November 7, 2018 9:50:21 PM(UTC)
| Reason: Not specified
|
|
|
|
Rank: Member
Groups: Registered
Joined: 2/6/2018(UTC) Posts: 11  Location: Paris Thanks: 6 times
|
Hello, I've try your first code but it throws a StackOverflowException that I think i've handle. Then my problem is that the ScrollToPoint is not precise at all... It zoom in far away from the word i'm looking for ! Here are the two functions I'm working on: Code: public void highligtText(string text)
{
int cnt = this.pdfViewer.Document.Pages.Count;
for (int i = 0; i < cnt; i++)
{
var found = this.pdfViewer.Document.Pages[i].Text.Find(text, Patagames.Pdf.Enums.FindFlags.None, 0); //1)
if (found == null)
continue;
do
{
try
{
zoomRecherche(text, found);
}
catch (StackOverflowException e)
{
if (e.Source != null)
Console.WriteLine("IOException source: {0}", e.Source);
throw;
}
this.pdfViewer.HighlightText(i, found.CharIndex, found.CharsCount, System.Windows.Media.Color.FromArgb(90, 219, 0, 25));
} while (found.FindNext());
}
}
/// <summary>
/// Zoom sur l'élément recherché
/// </summary>
public void zoomRecherche(string text, dynamic found)
{
int idx = pdfViewer.CurrentIndex;
if(idx >= 0)
{
var textInfo = found.FindedText;
foreach (var rect in textInfo.Rects)
{
Point p = rect.Position;
// Aller à la position de point p
this.pdfViewer.ScrollToPoint(idx, p);
}
pdfViewer.Zoom = 2f;
}
}
Do you have an idea on how can I zoom in on the word I'm looking for in a accurate way? Maybe there is a function which decide how to open the PDF which make the opening a bit messy ? Thanks Edited by user Tuesday, February 27, 2018 8:39:56 AM(UTC)
| Reason: Not specified
|
|
|
|
Rank: Administration
Groups: Administrators
Joined: 1/5/2016(UTC) Posts: 1,075
Thanks: 6 times Was thanked: 124 time(s) in 121 post(s)
|
Hi, Looks like following thread will be helpful for you http://forum.patagames.c...or-Position-on-PdfViewer Update. Although, seems a simpler solution will be acceptable. Just change the order of yours actions 1. First zoom the page 2. Then call the ScrollToPoint method. Edited by user Monday, February 26, 2018 9:02:01 AM(UTC)
| Reason: Not specified
|
|
|
|
Rank: Member
Groups: Registered
Joined: 2/6/2018(UTC) Posts: 11  Location: Paris Thanks: 6 times
|
Thanks, for the answer. I have done your modifications but still, the word i'm looking for is never in the window (whereas i'm sure that this code should work). I believe that something else (an other padding or function) is on the way to get it precise. Do you have an idea of what should i be looking for? Thanks anyway, you do a great job !!! :) Edited by user Tuesday, February 27, 2018 9:01:31 AM(UTC)
| Reason: Not specified
|
|
|
|
Rank: Administration
Groups: Administrators
Joined: 1/5/2016(UTC) Posts: 1,075
Thanks: 6 times Was thanked: 124 time(s) in 121 post(s)
|
Well... You also may look in source code of original PdfToolStripSearch. This code does exactly what you need. Produces text search, selects text and is positioned on the found. https://github.com/Patag...rs/PdfToolStripSearch.csHow to zoom in page around point shown at the link in my previous post.
|
 1 user thanked Paul Rayman for this useful post.
|
|
|
Patagames Software Support Forum
»
Pdfium.Net SDK
»
FAQ
»
How to search for a text in a PDF file and return the coordinates if the text exist?
Forum Jump
You cannot post new topics in this forum.
You cannot reply to topics in this forum.
You cannot delete your posts in this forum.
You cannot edit your posts in this forum.
You cannot create polls in this forum.
You cannot vote in polls in this forum.
Important Information:
The Patagames Software Support Forum uses cookies. By continuing to browse this site, you are agreeing to our use of cookies.
More Details
Close