logo
Welcome Guest! To enable all features please Login or Register.

Notification

Icon
Error

Options
Go to last post Go to first unread
eagleview  
#1 Posted : 9 years ago
eagleview

Rank: Member

Groups: Registered
Joined: 1/28/2016(UTC)
Posts: 17
United States

Thanks: 3 times
HI,

Does the Tesseract.Net SDK support tesseract's ability to generate PDF output of a searchable PDF? I found that the newer versions of Tesseract (3.03 RC and later) support PDF output directly. That would make it much easier to work with the text that tesseract can find. Is this method already included in the .Net SDK? If so, can you show a quick example?

THANKS!

Michael
Paul Rayman  
#2 Posted : 9 years ago
Paul Rayman

Rank: Administration

Groups: Administrators
Joined: 1/5/2016(UTC)
Posts: 1,115

Thanks: 8 times
Was thanked: 130 time(s) in 127 post(s)
Yes, of course.

Please look at OcrPdfRenderer class

Code:

public void Tiff2Pdf()
{
    using (var api = OcrApi.Create())
    {
        api.Init(Languages.English);
        //Create the renderer to PDF file output. The extension will be added automatically
        using (var renderer = OcrPdfRenderer.Create("multipage_pdf_file", "c:\\YourApp\\tessdata\\"))
        {
            renderer.BeginDocument("Title");
            api.ProcessPages(@"c:\multipage.tif", null, 0, renderer);
            renderer.EndDocument();
        }
    }
}


or

Code:

static void Main(string[] args)
{
	PdfCommon.Initialize();

	double scaleFactor = 1;
	var ocr = OcrApi.Create();
	ocr.Init(Languages.English);

	using (var renderer = OcrPdfRenderer.Create(@"d:\3\multipage_pdf_file", "tessdata\\"))
	{
		renderer.BeginDocument("document title");

		int i = 0;
		using (var doc = PdfDocument.Load(@"d:\3\review.pdf"))
		{
			foreach (var page in doc.Pages)
			{
				Console.WriteLine(string.Format("Page {0}", i++));
				int width = (int)(page.Width * scaleFactor);
				int height = (int)(page.Height * scaleFactor);
				using (var bitmap = new PdfBitmap(width, height, true))
				{
					bitmap.FillRect(0, 0, width, height, Color.White);
					page.Render(bitmap, 0, 0, width, height, PageRotate.Normal, RenderFlags.FPDF_LCD_TEXT);
					ocr.ProcessPage(OcrPix.FromBitmap(bitmap.Image as Bitmap), null, 0, renderer);
				}
			}
		}
		renderer.EndDocument();
	}
	ocr.Dispose();
}

Edited by user 9 years ago  | Reason: Not specified

eagleview  
#3 Posted : 9 years ago
eagleview

Rank: Member

Groups: Registered
Joined: 1/28/2016(UTC)
Posts: 17
United States

Thanks: 3 times
Looks perfect for what I need to do! Thanks, Paul!!
Paul Rayman  
#4 Posted : 9 years ago
Paul Rayman

Rank: Administration

Groups: Administrators
Joined: 1/5/2016(UTC)
Posts: 1,115

Thanks: 8 times
Was thanked: 130 time(s) in 127 post(s)
you welcome
Users browsing this topic
Guest
Forum Jump  
You cannot post new topics in this forum.
You cannot reply to topics in this forum.
You cannot delete your posts in this forum.
You cannot edit your posts in this forum.
You cannot create polls in this forum.
You cannot vote in polls in this forum.