org.opencms.util
Class CmsHtmlStripper

java.lang.Object
  extended by org.opencms.util.CmsHtmlStripper

public final class CmsHtmlStripper
extends java.lang.Object

Simple html tag stripper that allows configuration of html tag names that are allowed.

All tags that are not explicitly allowed via invocation of one of the addPreserve... methods will be missing in the result of the method stripHtml(String).

Instances are reusable but not shareable (multithreading). If configuration should be changed between subsequent invocations of stripHtml(String) method reset() has to be called.

Since:
6.9.2
Version:
$Revision: 1.8 $
Author:
Achim Westermann

Constructor Summary
CmsHtmlStripper()
          Default constructor that turns echo on and uses the settings for replacing tags.
CmsHtmlStripper(boolean useTidy)
          Creates an instance with control whether tidy is used.
 
Method Summary
 boolean addPreserveTag(java.lang.String tagName)
          Adds a tag that will be preserved by stripHtml(String).
 void addPreserveTagList(java.util.List preserveTags)
          Convenience method for adding several tags to preserve.
 void addPreserveTags(java.lang.String tagList, char separator)
          Convenience method for adding several tags to preserve in form of a delimiter-separated String.
 void reset()
          Resets the configuration of the tags to preserve.
 java.lang.String stripHtml(java.lang.String html)
          Extracts the text from the given html content, assuming the given html encoding.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

CmsHtmlStripper

public CmsHtmlStripper()
Default constructor that turns echo on and uses the settings for replacing tags.


CmsHtmlStripper

public CmsHtmlStripper(boolean useTidy)
Creates an instance with control whether tidy is used.

Parameters:
useTidy - if true tidy will be used
Method Detail

addPreserveTag

public boolean addPreserveTag(java.lang.String tagName)
Adds a tag that will be preserved by stripHtml(String).

Parameters:
tagName - the name of the tag to keep (case insensitive)
Returns:
true if the tagName was added correctly to the internal engine

addPreserveTagList

public void addPreserveTagList(java.util.List preserveTags)
Convenience method for adding several tags to preserve.

Parameters:
preserveTags - a List<String> with the case-insensitive tag names of the tags to preserve
See Also:
addPreserveTag(String)

addPreserveTags

public void addPreserveTags(java.lang.String tagList,
                            char separator)
Convenience method for adding several tags to preserve in form of a delimiter-separated String.

The String will be CmsStringUtil.splitAsList(String, char, boolean) with tagList as the first argument, separator as the second argument and the third argument set to true (trimming - support).

Parameters:
tagList - a delimiter-separated String with case-insensitive tag names to preserve by stripHtml(String)
separator - the delimiter that separates tag names in the tagList argument
See Also:
addPreserveTag(String)

reset

public void reset()
Resets the configuration of the tags to preserve.

This is called from the constructor and only has to be called if this instance is reused with a differen configuration (of tags to keep).


stripHtml

public java.lang.String stripHtml(java.lang.String html)
                           throws org.htmlparser.util.ParserException
Extracts the text from the given html content, assuming the given html encoding.

Additionally tags are replaced / removed according to the configuration of this instance.

Please note:

There are static process methods in the superclass that will not do the replacements / removals. Don't mix them up with this method.

Parameters:
html - the content to extract the plain text from.
Returns:
the text extracted from the given html content.
Throws:
org.htmlparser.util.ParserException - if something goes wrong.