Rank: Guest
Groups: Guests
Joined: 1/5/2016(UTC) Posts: 162
Was thanked: 5 time(s) in 5 post(s)
|
Hello,
while evaluating and using Pdfium.NET SDK, we observed a problem by extracting text from pages.
We have a document that can be opened by Adobe Acrobat Reader and Google Chrome (Pdf Viewer - Pdfium).
If we select the from the first page the drawn rectangles (extracted glyphs) are correct.
When we extract the text from the first page with Patagames.Pdf (using PdfTextObject.GetCharRect()) the detected rectangles for the characters are incorrect.
However, the BoundingBox of the whole text-row (PdfTextObject) is correct.
|
|
|
|
Rank: Administration
Groups: Administrators
Joined: 1/5/2016(UTC) Posts: 1,113
Thanks: 8 times Was thanked: 130 time(s) in 127 post(s)
|
Hi, This method returns the raw data without applying the transformation matrices. You have to do it yourself. Something like this: Code:var bb = obj.GetCharRect(i);
var matrix = obj.TextMatrix;
bb.left = bb.left * matrix.a + matrix.e;
bb.right = bb.right* matrix.a + matrix.e;
bb.top = bb.top * matrix.d + matrix.f;
bb.bottom = bb.bottom * matrix.d + matrix.f;
Please take a look at the code below. It's illustrates how to convert the raw char rect into page's coordinates and then into user control's coordinate. I check it on your file (incorrect_rectangles.pdf) it correctly fills all letters on current page. Code:
private void button45_Click(object sender, EventArgs e)
{
var page = pdfViewer1.Document.Pages.CurrentPage;
using (var g = Graphics.FromHwnd(pdfViewer1.Handle))
{
foreach (var o in page.PageObjects)
{
var obj = o as PdfTextObject;
if (obj == null)
continue;
for (int i = 0; i < obj.CharsCount; i++)
{
var bb = obj.GetCharRect(i);
var matrix = obj.TextMatrix;
bb.left = bb.left * matrix.a + matrix.e;
bb.right = bb.right* matrix.a + matrix.e;
bb.top = bb.top * matrix.d + matrix.f;
bb.bottom = bb.bottom * matrix.d + matrix.f;
var pt1 = pdfViewer1.PageToClient(
pdfViewer1.Document.Pages.CurrentIndex,
new PointF(bb.left, bb.top));
var pt2 = pdfViewer1.PageToClient(
pdfViewer1.Document.Pages.CurrentIndex,
new PointF(bb.right, bb.bottom));
g.FillRectangle(
new SolidBrush(Color.FromArgb(50, 99, 0, 0)),
pt1.X, pt1.Y, pt2.X - pt1.X, pt2.Y - pt1.Y);
}
}
}
}
Edited by user Saturday, January 23, 2016 10:06:15 AM(UTC)
| Reason: Not specified
|
|
|
|
Forum Jump
You cannot post new topics in this forum.
You cannot reply to topics in this forum.
You cannot delete your posts in this forum.
You cannot edit your posts in this forum.
You cannot create polls in this forum.
You cannot vote in polls in this forum.
Important Information:
The Patagames Software Support Forum uses cookies. By continuing to browse this site, you are agreeing to our use of cookies.
More Details
Close