|
||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectorg.opencms.search.CmsSearchIndex
public class CmsSearchIndex
Implements the search within an index and the management of the index configuration.
Nested Class Summary | |
---|---|
protected class |
CmsSearchIndex.LazyContentReader
Lucene filter index reader implementation that will ensure the OpenCms default search index fields CmsSearchField.FIELD_CONTENT and CmsSearchField.FIELD_CONTENT_BLOB
are lazy loaded. |
Field Summary | |
---|---|
static java.lang.String |
BACKUP_REINDEXING
Constant for additional parameter to enable optimized full index regeneration (default: false). |
protected static org.apache.lucene.document.FieldSelector |
CONTENT_SELECTOR
Field selector for Lucene that that will ensure the OpenCms default search index fields CmsSearchField.FIELD_CONTENT and CmsSearchField.FIELD_CONTENT_BLOB
are lazy loaded. |
static java.lang.String |
EXCERPT
Constant for additional parameter to enable excerpt creation (default: true). |
static java.lang.String |
EXTRACT_CONTENT
Constant for additional parameter for index content extraction. |
static java.lang.String |
LUCENE_AUTO_COMMIT
Constant for additional parameter for the Lucene index setting. |
static java.lang.String |
LUCENE_MAX_MERGE_DOCS
Constant for additional parameter for the Lucene index setting. |
static java.lang.String |
LUCENE_MERGE_FACTOR
Constant for additional parameter for the Lucene index setting. |
static java.lang.String |
LUCENE_RAM_BUFFER_SIZE_MB
Constant for additional parameter for the Lucene index setting. |
static java.lang.String |
LUCENE_USE_COMPOUND_FILE
Constant for additional parameter for the Lucene index setting. |
protected java.util.List<CmsSearchIndexSource> |
m_sources
The list of configured index sources. |
static java.lang.String |
MAX_HITS
Constant for additional parameter for controlling how many hits are loaded at maximum (default: 1000). |
static int |
MAX_HITS_DEFAULT
Indicates how many hits are loaded at maximum by default. |
static java.lang.String |
PERMISSIONS
Constant for additional parameter to enable permission checks (default: true). |
static java.lang.String |
PRIORITY
Constant for additional parameter to set the thread priority during search. |
static java.lang.String |
REBUILD_MODE_AUTO
Automatic ("auto") index rebuild mode. |
static java.lang.String |
REBUILD_MODE_MANUAL
Manual ("manual") index rebuild mode. |
static java.lang.String |
REBUILD_MODE_OFFLINE
Offline ("offline") index rebuild mode. |
static java.lang.String |
TIME_RANGE
Constant for additional parameter to enable time range checks (default: true). |
Fields inherited from interface org.opencms.configuration.I_CmsConfigurationParameterHandler |
---|
ADD_PARAMETER_METHOD, INIT_CONFIGURATION_METHOD |
Constructor Summary | |
---|---|
CmsSearchIndex()
Default constructor only intended to be used by the XML configuration. |
|
CmsSearchIndex(java.lang.String name)
Creates a new CmsSearchIndex with the given name. |
Method Summary | |
---|---|
void |
addConfigurationParameter(java.lang.String key,
java.lang.String value)
Adds a parameter. |
void |
addSourceName(java.lang.String sourceName)
Adds am index source to this search index. |
boolean |
checkConfiguration(CmsObject cms)
Checks is this index has been configured correctly. |
protected java.lang.String |
createIndexBackup()
Creates a backup of this index for optimized re-indexing of the whole content. |
boolean |
equals(java.lang.Object obj)
|
protected void |
extendPathFilter(org.apache.lucene.search.TermsFilter pathFilter,
java.lang.String searchRoot)
Extends the given path query with another term for the given search root element. |
org.apache.lucene.analysis.Analyzer |
getAnalyzer()
Returns the Lucene analyzer used for this index. |
java.util.Map<java.lang.String,java.lang.Object> |
getConfiguration()
Returns the configuration of this parameter configurable class instance, or null if the class does not need to be configured. |
org.apache.lucene.document.Document |
getDocument(java.lang.String rootPath)
Returns the Lucene document with the given root path from the index. |
I_CmsDocumentFactory |
getDocumentFactory(CmsResource res)
Returns the document type factory used for the given resource in this index, or null
in case the resource is not indexed by this index. |
CmsSearchFieldConfiguration |
getFieldConfiguration()
Returns the search field configuration of this index. |
java.lang.String |
getFieldConfigurationName()
Returns the name of the field configuration used for this index. |
org.apache.lucene.index.IndexWriter |
getIndexWriter(boolean create)
Returns a new index writer for this index. |
java.util.Locale |
getLocale()
Returns the language locale of this index. |
java.lang.String |
getLocaleString()
Returns the language locale of the index as a String. |
int |
getMaxHits()
Indicates the number of how many hits are loaded at maximum. |
protected org.apache.lucene.search.Filter |
getMultiTermQueryFilter(java.lang.String field,
java.util.List<java.lang.String> terms)
Returns a cached Lucene term query filter for the given field and terms. |
protected org.apache.lucene.search.Filter |
getMultiTermQueryFilter(java.lang.String field,
java.lang.String terms)
Returns a cached Lucene term query filter for the given field and terms. |
protected org.apache.lucene.search.Filter |
getMultiTermQueryFilter(java.lang.String field,
java.lang.String termsStr,
java.util.List<java.lang.String> termsList)
Returns a cached Lucene term query filter for the given field and terms. |
java.lang.String |
getName()
Gets the name of this index. |
java.lang.String |
getPath()
Returns the path where this index stores it's data in the "real" file system. |
int |
getPriority()
Returns the Thread priority for this search index. |
java.lang.String |
getProject()
Gets the project of this index. |
java.lang.String |
getRebuildMode()
Get the rebuild mode of this index. |
org.apache.lucene.search.IndexSearcher |
getSearcher()
Returns the Lucene index searcher used for this search index. |
java.util.List<java.lang.String> |
getSourceNames()
Returns all configured sources names of this search index. |
java.util.List<CmsSearchIndexSource> |
getSources()
Returns all configured index sources of this search index. |
protected org.apache.lucene.search.Filter |
getTermQueryFilter(java.lang.String field,
java.lang.String term)
Returns a cached Lucene term query filter for the given field and term. |
int |
hashCode()
|
protected boolean |
hasReadPermission(CmsObject cms,
org.apache.lucene.document.Document doc)
Checks if the OpenCms resource referenced by the result document can be read be the user of the given OpenCms context. |
protected void |
indexSearcherClose()
Closes the Lucene index searcher for this index. |
protected void |
indexSearcherOpen(java.lang.String path)
Initializes the Lucene index searcher for this index. |
void |
initConfiguration()
Initializes a configuration after all parameters have been added. |
void |
initialize()
Initializes the search index. |
boolean |
isBackupReindexing()
Returns true if backup re-indexing is done by this index. |
boolean |
isCheckingPermissions()
Returns true if permissions are checked for search results by this index. |
boolean |
isCheckingTimeRange()
Returns true if the document time range is checked for search results by this index. |
boolean |
isCreatingExcerpt()
Returns true if an excerpt is generated by this index. |
boolean |
isEnabled()
Returns true if this index is currently disabled. |
boolean |
isExtractingContent()
Returns true if full text is extracted by this index. |
protected boolean |
isInTimeRange(org.apache.lucene.document.Document doc,
CmsSearchParameters params)
Checks if the document is in the time range specified in the search parameters. |
protected void |
removeIndexBackup(java.lang.String path)
Removes the given backup folder of this index. |
void |
removeSourceName(java.lang.String sourceName)
Removes an index source from this search index. |
CmsSearchResultList |
search(CmsObject cms,
CmsSearchParameters params)
Performs a search on the index within the given fields. |
void |
setAnalyzer(org.apache.lucene.analysis.Analyzer analyzer)
Sets the Lucene analyzer used for this index. |
void |
setEnabled(boolean enabled)
Can be used to enable / disable this index. |
void |
setFieldConfiguration(CmsSearchFieldConfiguration fieldConfiguration)
Sets the field configuration used for this index. |
void |
setFieldConfigurationName(java.lang.String fieldConfigurationName)
Sets the name of the field configuration used for this index. |
void |
setLocale(java.util.Locale locale)
Sets the locale to index resources. |
void |
setLocaleString(java.lang.String locale)
Sets the locale to index resources as a String. |
void |
setMaxHits(int maxHits)
Sets the number of how many hits are loaded at maximum. |
void |
setName(java.lang.String name)
Sets the logical key/name of this search index. |
void |
setProject(java.lang.String projectName)
Sets the name of the project used to index resources. |
void |
setProjectName(java.lang.String projectName)
Sets the name of the project used to index resources. |
void |
setRebuildMode(java.lang.String rebuildMode)
Sets the rebuild mode of this search index. |
void |
shutDown()
Shuts down the search index. |
java.lang.String |
toString()
Returns the name ( ) of this search index. |
Methods inherited from class java.lang.Object |
---|
clone, finalize, getClass, notify, notifyAll, wait, wait, wait |
Field Detail |
---|
public static final java.lang.String BACKUP_REINDEXING
public static final java.lang.String EXCERPT
public static final java.lang.String EXTRACT_CONTENT
public static final java.lang.String LUCENE_AUTO_COMMIT
public static final java.lang.String LUCENE_MAX_MERGE_DOCS
public static final java.lang.String LUCENE_MERGE_FACTOR
public static final java.lang.String LUCENE_RAM_BUFFER_SIZE_MB
public static final java.lang.String LUCENE_USE_COMPOUND_FILE
public static final java.lang.String MAX_HITS
public static final int MAX_HITS_DEFAULT
public static final java.lang.String PERMISSIONS
public static final java.lang.String PRIORITY
public static final java.lang.String REBUILD_MODE_AUTO
public static final java.lang.String REBUILD_MODE_MANUAL
public static final java.lang.String REBUILD_MODE_OFFLINE
public static final java.lang.String TIME_RANGE
protected static final org.apache.lucene.document.FieldSelector CONTENT_SELECTOR
CmsSearchField.FIELD_CONTENT
and CmsSearchField.FIELD_CONTENT_BLOB
are lazy loaded.This is to optimize performance - these 2 fields will be rather large especially for extracted binary documents like PDF, MS Office etc. By using lazy fields the data is only read when it is actually used.
protected java.util.List<CmsSearchIndexSource> m_sources
Constructor Detail |
---|
public CmsSearchIndex()
It is recommended to use the constructor
as it enforces the mandatory name argument. CmsSearchIndex(String)
public CmsSearchIndex(java.lang.String name) throws CmsIllegalArgumentException
name
- the system-wide unique name for the search index
CmsIllegalArgumentException
- if the given name is null, empty or already taken by another search indexMethod Detail |
---|
public void addConfigurationParameter(java.lang.String key, java.lang.String value)
addConfigurationParameter
in interface I_CmsConfigurationParameterHandler
key
- the key/name of the parametervalue
- the value of the parameterpublic void addSourceName(java.lang.String sourceName)
sourceName
- the index source name to addpublic boolean checkConfiguration(CmsObject cms)
In case the check fails, the enabled
property
is set to false
cms
- a OpenCms user context to perform the checks with (should have "Administrator" permissions)
true
in case the index is correctly configured and enabled after the checkisEnabled()
public boolean equals(java.lang.Object obj)
equals
in class java.lang.Object
Object.equals(java.lang.Object)
public org.apache.lucene.analysis.Analyzer getAnalyzer()
public java.util.Map<java.lang.String,java.lang.Object> getConfiguration()
I_CmsConfigurationParameterHandler
null
if the class does not need to be configured.
All elements in the configuration are key, value String pairs,
set using the I_CmsConfigurationParameterHandler.addConfigurationParameter(String, String)
method
during initialization of the loader.
Implementations will (should) not to return a direct reference to the internal configuration but just a copy of it, to avoid unwanted external manipulation.
getConfiguration
in interface I_CmsConfigurationParameterHandler
null
I_CmsConfigurationParameterHandler.getConfiguration()
public org.apache.lucene.document.Document getDocument(java.lang.String rootPath)
rootPath
- the root path of the document to get
public I_CmsDocumentFactory getDocumentFactory(CmsResource res)
null
in case the resource is not indexed by this index.A resource is indexed if the following is all true:
res
- the resource to check
null
in case the resource is not indexed by this indexpublic CmsSearchFieldConfiguration getFieldConfiguration()
public java.lang.String getFieldConfigurationName()
public org.apache.lucene.index.IndexWriter getIndexWriter(boolean create) throws CmsIndexException
create
- if true
a whole new index is created, if false
an existing index is updated
CmsIndexException
- if the index can not be openedpublic java.util.Locale getLocale()
public java.lang.String getLocaleString()
getLocale()
public int getMaxHits()
Since Lucene 2.4, the number of maximum documents to load from the index
must be specified. The default of this setting is MAX_HITS_DEFAULT
(5000).
This means that at maximum 5000 results are returned from the index.
Please note that this number may be reduced further because of OpenCms read permissions
or per-user file visibility settings not controlled in the index.
public java.lang.String getName()
public java.lang.String getPath()
public int getPriority()
public java.lang.String getProject()
public java.lang.String getRebuildMode()
public org.apache.lucene.search.IndexSearcher getSearcher()
public java.util.List<java.lang.String> getSourceNames()
public java.util.List<CmsSearchIndexSource> getSources()
public int hashCode()
hashCode
in class java.lang.Object
Object.hashCode()
public void initConfiguration()
I_CmsConfigurationParameterHandler
initConfiguration
in interface I_CmsConfigurationParameterHandler
I_CmsConfigurationParameterHandler.initConfiguration()
public void initialize() throws CmsSearchException
CmsSearchException
- if the index source association failedpublic boolean isBackupReindexing()
true
if backup re-indexing is done by this index.This is an optimization method by which the old extracted content is reused in order to save performance when re-indexing.
true
if backup re-indexing is done by this indexpublic boolean isCheckingPermissions()
true
if permissions are checked for search results by this index.
If permission checks are not required, they can be turned off in the index search configuration parameters
in opencms-search.xml
. Not checking permissions will improve performance.
This is can be of use in scenarios when you know that all search results are always readable, which is usually true for public websites that do not have personalized accounts.
Please note that even if a result is returned where the current user has no read permissions, the user can not actually access this document. It will only appear in the search result list, but if the user clicks the link to open the document he will get an error.
true
if permissions are checked for search results by this indexpublic boolean isCheckingTimeRange()
true
if the document time range is checked for search results by this index.
If time range checks are not required, they can be turned off in the index search configuration parameters
in opencms-search.xml
. Not checking the time range will improve performance.
true
if the document time range is checked for search results by this indexpublic boolean isCreatingExcerpt()
true
if an excerpt is generated by this index.
If no except is required, generation can be turned off in the index search configuration parameters
in opencms-search.xml
. Not generating an excerpt will improve performance.
true
if an excerpt is generated by this indexpublic boolean isEnabled()
true
if this index is currently disabled.
true
if this index is currently disabledpublic boolean isExtractingContent()
true
if full text is extracted by this index.
Full text content extraction can be turned off in the index search configuration parameters
in opencms-search.xml
.
Not extraction the full text information will highly improve performance.
true
if full text is extracted by this indexpublic void removeSourceName(java.lang.String sourceName)
sourceName
- the index source name to removepublic CmsSearchResultList search(CmsObject cms, CmsSearchParameters params) throws CmsSearchException
The result is returned as List with entries of type I_CmsSearchResult.
cms
- the current user's Cms objectparams
- the parameters to use for the search
CmsSearchException
- if something goes wrongpublic void setAnalyzer(org.apache.lucene.analysis.Analyzer analyzer)
analyzer
- the Lucene analyzer to setpublic void setEnabled(boolean enabled)
enabled
- the state of the index to setpublic void setFieldConfiguration(CmsSearchFieldConfiguration fieldConfiguration)
fieldConfiguration
- the field configuration to setpublic void setFieldConfigurationName(java.lang.String fieldConfigurationName)
fieldConfigurationName
- the name of the field configuration to setpublic void setLocale(java.util.Locale locale)
locale
- the locale to index resourcespublic void setLocaleString(java.lang.String locale)
locale
- the locale to index resourcessetLocale(Locale)
public void setMaxHits(int maxHits)
This must be set at least to 50, or this setting is ignored.
maxHits
- the number of how many hits are loaded at maximum to setgetMaxHits()
public void setName(java.lang.String name) throws CmsIllegalArgumentException
name
- the logical key/name of this search index
CmsIllegalArgumentException
- if the given name is null, empty or already taken by another search indexpublic void setProject(java.lang.String projectName)
A duplicate method of
that allows
to use instances of this class as a widget object (bean convention,
cp.: setProjectName(String)
.getProject()
projectName
- the name of the project used to index resourcespublic void setProjectName(java.lang.String projectName)
projectName
- the name of the project used to index resourcespublic void setRebuildMode(java.lang.String rebuildMode)
rebuildMode
- the rebuild mode of this search index {auto|manual}public void shutDown() throws java.io.IOException
This will close the local Lucene index searcher instance.
java.io.IOException
- in case the index could not be closedpublic java.lang.String toString()
getName()
) of this search index.
toString
in class java.lang.Object
getName()
) of this search indexObject.toString()
protected java.lang.String createIndexBackup()
null
in case no backup was createdprotected void extendPathFilter(org.apache.lucene.search.TermsFilter pathFilter, java.lang.String searchRoot)
pathFilter
- the path filter to extendsearchRoot
- the search root to add to the path queryprotected org.apache.lucene.search.Filter getMultiTermQueryFilter(java.lang.String field, java.util.List<java.lang.String> terms)
field
- the field to useterms
- the term to use
protected org.apache.lucene.search.Filter getMultiTermQueryFilter(java.lang.String field, java.lang.String terms)
field
- the field to useterms
- the term to use
protected org.apache.lucene.search.Filter getMultiTermQueryFilter(java.lang.String field, java.lang.String termsStr, java.util.List<java.lang.String> termsList)
field
- the field to usetermsStr
- the terms to use as a String separated by a space ' ' chartermsList
- the list of terms to use
protected org.apache.lucene.search.Filter getTermQueryFilter(java.lang.String field, java.lang.String term)
field
- the field to useterm
- the term to use
protected boolean hasReadPermission(CmsObject cms, org.apache.lucene.document.Document doc)
cms
- the OpenCms user context to use for permission testingdoc
- the search result document to check
true
if the user has read permissions to the resourceprotected void indexSearcherClose()
indexSearcherOpen(String)
protected void indexSearcherOpen(java.lang.String path)
Use getSearcher()
in order to obtain the searcher that has been opened.
In case there is an index searcher still open, it is closed first.
For performance reasons, one instance of the Lucene index searcher should be kept for all searches. However, if the index is updated or changed this searcher instance needs to be re-initialized.
path
- the path to the index directoryprotected boolean isInTimeRange(org.apache.lucene.document.Document doc, CmsSearchParameters params)
The creation date and/or the last modification date are checked.
doc
- the document to check the dates against the given time rangeparams
- the search parameters where the time ranges are specified
protected void removeIndexBackup(java.lang.String path)
path
- the backup folder to removeisBackupReindexing()
|
||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |