|
||||||
PREV PACKAGE NEXT PACKAGE | FRAMES NO FRAMES |
See:
Description
Interface Summary | |
---|---|
I_CmsExtractionResult | The result of a document text extraction. |
I_CmsTextExtractor | Allows extraction of the indexable "plain" text plus (optional) meta information from a given binary input document format. |
Class Summary | |
---|---|
A_CmsTextExtractor | Base utility class that allows extraction of the indexable "plain" text from a given document format. |
A_CmsTextExtractorMsOfficeBase | Base class to extract summary information from MS office documents. |
CmsExtractionResult | The result of a document text extraction. |
CmsExtractorHtml | Extracts the text from an HTML document. |
CmsExtractorMsExcel | Extracts the text from an MS Excel document. |
CmsExtractorMsPowerPoint | Extracts the text from an MS PowerPoint document. |
CmsExtractorMsWord | Extracts the text from an MS Word document. |
CmsExtractorOpenOffice | Extracts the text from OpenOffice documents (.ods, .odf). |
CmsExtractorPdf | Extracts the text from a PDF document. |
CmsExtractorRtf | Extracts the text from a RTF document. |
Contains a generic, low-level framework for extration of plain text content out of various popular file formats.
|
||||||
PREV PACKAGE NEXT PACKAGE | FRAMES NO FRAMES |