logo
Welcome Guest! To enable all features please Login or Register.

Notification

Icon
Error

Options
Go to last post Go to first unread
eagleview  
#1 Posted : Thursday, February 4, 2016 12:19:16 PM(UTC)
eagleview

Rank: Member

Groups: Registered
Joined: 1/28/2016(UTC)
Posts: 17
United States

Thanks: 3 times
HI,

Does the Tesseract.Net SDK support tesseract's ability to generate PDF output of a searchable PDF? I found that the newer versions of Tesseract (3.03 RC and later) support PDF output directly. That would make it much easier to work with the text that tesseract can find. Is this method already included in the .Net SDK? If so, can you show a quick example?

THANKS!

Michael
Paul Rayman  
#2 Posted : Friday, February 5, 2016 6:25:48 AM(UTC)
Paul Rayman

Rank: Administration

Groups: Administrators
Joined: 1/5/2016(UTC)
Posts: 1,011

Thanks: 5 times
Was thanked: 121 time(s) in 118 post(s)
Yes, of course.

Please look at OcrPdfRenderer class

Code:

public void Tiff2Pdf()
{
    using (var api = OcrApi.Create())
    {
        api.Init(Languages.English);
        //Create the renderer to PDF file output. The extension will be added automatically
        using (var renderer = OcrPdfRenderer.Create("multipage_pdf_file", "c:\\YourApp\\tessdata\\"))
        {
            renderer.BeginDocument("Title");
            api.ProcessPages(@"c:\multipage.tif", null, 0, renderer);
            renderer.EndDocument();
        }
    }
}


or

Code:

static void Main(string[] args)
{
	PdfCommon.Initialize();

	double scaleFactor = 1;
	var ocr = OcrApi.Create();
	ocr.Init(Languages.English);

	using (var renderer = OcrPdfRenderer.Create(@"d:\3\multipage_pdf_file", "tessdata\\"))
	{
		renderer.BeginDocument("document title");

		int i = 0;
		using (var doc = PdfDocument.Load(@"d:\3\review.pdf"))
		{
			foreach (var page in doc.Pages)
			{
				Console.WriteLine(string.Format("Page {0}", i++));
				int width = (int)(page.Width * scaleFactor);
				int height = (int)(page.Height * scaleFactor);
				using (var bitmap = new PdfBitmap(width, height, true))
				{
					bitmap.FillRect(0, 0, width, height, Color.White);
					page.Render(bitmap, 0, 0, width, height, PageRotate.Normal, RenderFlags.FPDF_LCD_TEXT);
					ocr.ProcessPage(OcrPix.FromBitmap(bitmap.Image as Bitmap), null, 0, renderer);
				}
			}
		}
		renderer.EndDocument();
	}
	ocr.Dispose();
}

Edited by user Thursday, March 31, 2016 5:28:26 AM(UTC)  | Reason: Not specified

eagleview  
#3 Posted : Friday, February 5, 2016 10:58:26 AM(UTC)
eagleview

Rank: Member

Groups: Registered
Joined: 1/28/2016(UTC)
Posts: 17
United States

Thanks: 3 times
Looks perfect for what I need to do! Thanks, Paul!!
Paul Rayman  
#4 Posted : Friday, February 5, 2016 11:04:39 AM(UTC)
Paul Rayman

Rank: Administration

Groups: Administrators
Joined: 1/5/2016(UTC)
Posts: 1,011

Thanks: 5 times
Was thanked: 121 time(s) in 118 post(s)
you welcome
Users browsing this topic
Forum Jump  
You cannot post new topics in this forum.
You cannot reply to topics in this forum.
You cannot delete your posts in this forum.
You cannot edit your posts in this forum.
You cannot create polls in this forum.
You cannot vote in polls in this forum.