Shruti Poddar

Allows to highlight search results on one or more fields. The implementation uses either the lucene highlighter, fast-vector-highlighter or postings-highlighter. The following is an example of the ...

Allows to highlight search results on one or more fields. The implementation uses either the lucene highlighter, fast-vector-highlighter or postings-highlighter. The following is an example of the search request body:
{
"query" :

Unknown macro: {...}

,
"highlight" : {
"fields" : {
"content" : {}
}
}
}


In the above case, the content field will be highlighted for each search hit (there will be another element in each search hit, called highlight, which includes the highlighted fields and the highlighted fragments).


Note


In order to perform highlighting, the actual content of the field is required. If the field in question is stored (has store set to true in the mapping) it will be used, otherwise, the actual source will be loaded and the relevant field will be extracted from it.

The _all field cannot be extracted from _source, so it can only be used for highlighting if it mapped to have store set to true.

The field name supports wildcard notation. For example, using comment
* will cause all fields that match the expression to be highlighted.





Postings highlighteredit

If index_options is set to offsets in the mapping the postings highlighter will be used instead of the plain highlighter. The postings highlighter:

• Is faster since it doesn’t require to reanalyze the text to be highlighted: the larger the documents the better the performance gain should be
• Requires less disk space than term_vectors, needed for the fast vector highlighter
• Breaks the text into sentences and highlights them. Plays really well with natural languages, not as well with fields containing for instance html markup
• Treats the document as the whole corpus, and scores individual sentences as if they were documents in this corpus, using the BM25 algorithm

Here is an example of setting the content field to allow for highlighting using the postings highlighter on it:
{
"type_name" : {
"content" :

Unknown macro: {"index_options" }

}
}



Note


Note that the postings highlighter is meant to perform simple query terms highlighting, regardless of their positions. That means that when used for instance in combination with a phrase query, it will highlight all the terms that the query is composed of, regardless of whether they are actually part of a query match, effectively ignoring their positions.


Warning


The postings highlighter does support highlighting of multi term queries, like prefix queries, wildcard queries and so on. On the other hand, this requires the queries to be rewritten using a proper rewrite method that supports multi term extraction, which is a potentially expensive operation.





Fast vector highlighteredit

If term_vector information is provided by setting term_vector to with_positions_offsets in the mapping then the fast vector highlighter will be used instead of the plain highlighter. The fast vector highlighter:

• Is faster especially for large fields (> 1MB)
• Can be customized with boundary_chars, boundary_max_scan, and fragment_offset (see below)
• Requires setting term_vector to with_positions_offsets which increases the size of the index
• Can combine matches from multiple fields into one result. See matched_fields
• Can assign different weights to matches at different positions allowing for things like phrase matches being sorted above term matches when highlighting a Boosting Query that boosts phrase matches over term matches

Here is an example of setting the content field to allow for highlighting using the fast vector highlighter on it (this will cause the index to be bigger):
{
"type_name" : {
"content" :

Unknown macro: {"term_vector" }

}
}






Force highlighter typeedit

The type field allows to force a specific highlighter type. This is useful for instance when needing to use the plain highlighter on a field that has term_vectors enabled. The allowed values are: plain, postings and fvh. The following is an example that forces the use of the plain highlighter:
{
"query" :

,
"highlight" : {
"fields" : {
"content" :

Unknown macro: {"type" }

}
}
}






Force highlighting on sourceedit


Note


Added in 1.0.0.RC1.

Forces the highlighting to highlight fields based on the source even if fields are stored separately. Defaults to false.
{
"query" :

Unknown macro: {...}

,
"highlight" : {
"fields" : {
"content" :

Unknown macro: {"force_source" }

}
}
}

  • More
  • CR-16
  • started review