Package org.opencms.search.extractors

Contains a generic, low-level framework for extration of plain text content out of various popular file formats.

See:
          Description

Interface Summary
I_CmsExtractionResult The result of a document text extraction.
I_CmsTextExtractor Allows extraction of the indexable "plain" text plus (optional) meta information from a given binary input document format.
 

Class Summary
A_CmsTextExtractor Base utility class that allows extraction of the indexable "plain" text from a given document format.
A_CmsTextExtractorMsOfficeBase Base class to extract summary information from MS office documents.
CmsExtractionResult The result of a document text extraction.
CmsExtractorHtml Extracts the text from an HTML document.
CmsExtractorMsExcel Extracts the text from an MS Excel document.
CmsExtractorMsPowerPoint Extracts the text from an MS PowerPoint document.
CmsExtractorMsWord Extracts the text from an MS Word document.
CmsExtractorOpenOffice Extracts the text from OpenOffice documents (.ods, .odf).
CmsExtractorPdf Extracts the text from a PDF document.
CmsExtractorRtf Extracts the text from a RTF document.
 

Package org.opencms.search.extractors Description

Contains a generic, low-level framework for extration of plain text content out of various popular file formats.

Since:
6.0.0
Version:
$Revision: 1.7 $