org.opencms.search.extractors
Class CmsExtractorHtml

java.lang.Object
  extended by org.opencms.search.extractors.A_CmsTextExtractor
      extended by org.opencms.search.extractors.CmsExtractorHtml
All Implemented Interfaces:
I_CmsTextExtractor

public final class CmsExtractorHtml
extends A_CmsTextExtractor

Extracts the text from an HTML document.

Since:
6.0.0
Version:
$Revision: 1.14 $
Author:
Alexander Kandzior

Field Summary
 
Fields inherited from class org.opencms.search.extractors.A_CmsTextExtractor
m_inputBuffer
 
Method Summary
 I_CmsExtractionResult extractText(java.io.InputStream in, java.lang.String encoding)
          Extracts the text and meta information from the document on the input stream, using the specified content encoding.
static I_CmsTextExtractor getExtractor()
          Returns an instance of this text extractor.
 
Methods inherited from class org.opencms.search.extractors.A_CmsTextExtractor
combineContentItem, extractText, extractText, extractText, getStreamCopy, removeControlChars
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Method Detail

getExtractor

public static I_CmsTextExtractor getExtractor()
Returns an instance of this text extractor.

Returns:
an instance of this text extractor

extractText

public I_CmsExtractionResult extractText(java.io.InputStream in,
                                         java.lang.String encoding)
                                  throws java.lang.Exception
Description copied from interface: I_CmsTextExtractor
Extracts the text and meta information from the document on the input stream, using the specified content encoding.

The encoding is a hint for the text extractor, if the value given is null then the text extractor should try to figure out the encoding itself.

Specified by:
extractText in interface I_CmsTextExtractor
Overrides:
extractText in class A_CmsTextExtractor
Parameters:
in - the input stream for the document to extract the text from
encoding - the encoding to use
Returns:
the extracted text and meta information
Throws:
java.lang.Exception - if the text extration fails
See Also:
I_CmsTextExtractor.extractText(java.io.InputStream, java.lang.String)