Installing and Using Elasticsearch Plugins
Traducciones al EspañolEstamos traduciendo nuestros guías y tutoriales al Español. Es posible que usted esté viendo una traducción generada automáticamente. Estamos trabajando con traductores profesionales para verificar las traducciones de nuestro sitio web. Este proyecto es un trabajo en curso.
What are Elasticsearch Plugins?
Elasticsearch is an open source, scalable search engine. Although Elasticsearch supports a large number of features out-of-the-box, it can also be extended with a variety of plugins to provide advanced analytics and process different data types.
This guide will show to how install the following Elasticsearch plugins and interact with them using the Elasticsearch API:
- ingest-attachment: allows Elasticsearch to index and search base64-encoded documents in formats such as RTF, PDF, and PPT.
- analysis-phonetic: identifies search results that sound similar to the search term.
- ingest-geoip: adds location information to indexed documents based on any IP addresses within the document.
- ingest-user-agent: parses the
User-Agent
header of HTTP requests to provide identifying information about the client that sent each request.
NoteThis guide is written for a non-root user. Commands that require elevated privileges are prefixed withsudo
. If you’re not familiar with thesudo
command, you can check our Users and Groups guide.
Before You Begin
If you have not already done so, create a Linode account and Compute Instance. See our Getting Started with Linode and Creating a Compute Instance guides.
Follow our Setting Up and Securing a Compute Instance guide to update your system. You may also wish to set the timezone, configure your hostname, create a limited user account, and harden SSH access.
Installation
Java
As of this writing, Elasticsearch requires Java 8.
OpenJDK 8 is available from the official repositories. Install the headless OpenJDK 8 package:
sudo apt install openjdk-8-jre-headless
Confirm that Java is installed:
java -version
The output should be similar to:
openjdk version "1.8.0_151" OpenJDK Runtime Environment (build 1.8.0_151-8u151-b12-1~deb9u1-b12) OpenJDK 64-Bit Server VM (build 25.151-b12, mixed mode)
Elasticsearch
Install the official Elastic APT package signing key:
wget -qO - https://artifacts.elastic.co/GPG-KEY-elasticsearch | sudo apt-key add -
Install the
apt-transport-https
package, which is required to retrieve deb packages served over HTTPS:sudo apt-get install apt-transport-https
Add the APT repository information to your server’s list of sources:
echo "deb https://artifacts.elastic.co/packages/6.x/apt stable main" | sudo tee -a /etc/apt/sources.list.d/elastic.list
Update the list of available packages:
sudo apt-get update
Install the
elasticsearch
package:sudo apt-get install -y elasticsearch
Set the JVM heap size to approximately half of your server’s available memory. For example, if your server has 1GB of RAM, change the
Xms
andXmx
values in the/etc/elasticsearch/jvm.options
file to512m
. Leave the other values in this file unchanged:- File: /etc/elasticsearch/jvm.options
-Xms512m -Xmx512m
Enable and start the
elasticsearch
service:sudo systemctl enable elasticsearch sudo systemctl start elasticsearch
Wait a few moments for the service to start, then confirm that the Elasticsearch API is available:
curl localhost:9200
The Elasticsearch REST API should return a JSON response similar to the following:
{ "name" : "Sch1T0D", "cluster_name" : "docker-cluster", "cluster_uuid" : "MH6WKAm0Qz2r8jFK-TcbNg", "version" : { "number" : "6.1.1", "build_hash" : "bd92e7f", "build_date" : "2017-12-17T20:23:25.338Z", "build_snapshot" : false, "lucene_version" : "7.1.0", "minimum_wire_compatibility_version" : "5.6.0", "minimum_index_compatibility_version" : "5.0.0" }, "tagline" : "You Know, for Search" }
To determine whether or not the service has started successfully, view the most recent logs:
systemctl status elasticsearch
You are now ready to install and use Elasticsearch plugins.
Elasticsearch Plugins
The remainder of this guide will walk through several plugins and common use cases. Many of the following steps will involve communicating with the Elasticsearch API. For example, in order to index a sample document into Elasticsearch, a POST
request with a JSON payload must be sent to /{index name}/{type}/{document id}
:
POST /exampleindex/doc/1
{
"message": "this the value for the message field"
}
There are a number of tools that can be used to issue this request. The simplest approach would be to use curl
from the command line:
curl -H'Content-Type: application/json' -XPOST localhost:9200/exampleindex/doc/1 -d '{ "message": "this the value for the message field" }'
Other alternatives include the vim-rest-console, the Emacs plugin es-mode, or the Console plugin for Kibana. Use whichever tool is most convenient for you.
Prepare an Index
Before installing any plugins, create a test index.
Create an index named
test
with one shard and no replicas:POST /test { "settings": { "index": { "number_of_replicas": 0, "number_of_shards": 1 } } }
Note
These settings are suitable for testing, but additional shards and replicas should be used in a production environment.Add an example document to the index:
POST /test/doc/1 { "message": "this is an example document" }
Searches can be performed by using the
_search
URL endpoint. Search for “example” in the message field across all documents:POST /_search { "query": { "terms": { "message": ["example"] } } }
The Elasticsearch API should return the matching document.
Elasticsearch Attachment Plugin
The attachment plugin lets Elasticsearch accept a base64-encoded document and index its contents for easy searching. This is useful for searching PDF or rich text documents with minimal overhead.
Install the
ingest-attachment
plugin using theelasticsearch-plugin
tool:sudo /usr/share/elasticsearch/bin/elasticsearch-plugin install ingest-attachment
Restart elasticsearch:
sudo systemctl restart elasticsearch
Confirm that the plugin is installed as expected by using the
_cat
API:GET /_cat/plugins
The
ingest-attachment
plugin should be under the list of installed plugins.
In order to use the attachment plugin, a pipeline must be used to process base64-encoded data in the field of a document. An ingest pipeline is a way of performing additional steps when indexing a document in Elasticsearch. While Elasticsearch comes pre-installed with some pipeline processors (which can perform actions such as removing or adding fields), the attachment plugin installs an additional processor that can be used when defining a pipeline.
Create a pipeline called
doc-parser
which takes data from a field calledencoded_doc
and executes theattachment
processor on the field:PUT /_ingest/pipeline/doc-parser { "description" : "Extract text from base-64 encoded documents", "processors" : [ { "attachment" : { "field" : "encoded_doc" } } ] }
The
doc-parser
pipeline can now be specified when indexing documents to extract data from theencoded_doc
field.Note
By default, the attachment processor will create a new field calledattachment
with the parsed content of the target field. See the attachment processor documentation for additional information.Index an example RTF (rich-text formatted) document. The following string is an RTF document containing text that we would like to search. It consists of the base64-encoded text “Hello from inside of a rich text RTF document”:
e1xydGYxXGFuc2kKSGVsbG8gZnJvbSBpbnNpZGUgb2YgYSByaWNoIHRleHQgUlRGIGRvY3VtZW50LgpccGFyIH0K
Add this document to the test index, using the
?pipeline=doc_parser
parameter to specify the new pipeline:PUT /test/doc/rtf?pipeline=doc-parser { "encoded_doc": "e1xydGYxXGFuc2kKSGVsbG8gZnJvbSBpbnNpZGUgb2YgYSByaWNoIHRleHQgUlRGIGRvY3VtZW50LgpccGFyIH0K" }
Search for the term “rich”, which should return the indexed document:
POST /_search { "query": { "terms": { "attachment.content": ["rich"] } } }
This technique may be used to index and search other document types including PDF, PPT, and XLS. See the Apache Tika Project (which provides the underlying text extraction implementation) for additional supported file formats.
Phonetic Analysis Plugin
Elasticsearch excels when analyzing textual data. Several analyzers come bundled with Elasticsearch which can perform powerful analyses on text.
One of these analyzers is the Phonetic Analysis plugin. By using this plugin, it is possible to search for terms that sound similar to other words.
Install the plugin the
analysis-phonetic
plugin:sudo /usr/share/elasticsearch/bin/elasticsearch-plugin install analysis-phonetic
Restart Elasticsearch:
sudo systemctl restart elasticsearch
Confirm that the plugin has been successfully installed:
GET /_cat/plugins
In order to use this plugin, the following changes must be made to the test index:
- A filter must be created. This filter will be used to process the tokens that are created for fields of an indexed document.
- This filter will be used by an analyzer. An analyzer determines how a field is tokenized and how those tokenized items are processed by filters.
- Finally, we will configure the test index to use this analyzer for a field in the index with a mapping.
An index must be closed before analyzers and filters can be added.
Close the test index:
POST /test/_close
Define the analyzer and filter for the test index under the
_settings
API:PUT /test/_settings { "analysis": { "analyzer": { "my_phonetic_analyzer": { "tokenizer": "standard", "filter": [ "standard", "lowercase", "my_phonetic_filter" ] } }, "filter": { "my_phonetic_filter": { "type": "phonetic", "encoder": "metaphone", "replace": false } } } }
Re-open the index to enable searching and indexing:
POST /test/_open
Define a mapping for a field named
phonetic
which will use themy_phonetic_analyzer
analyzer:POST /test/_mapping/doc { "properties": { "phonetic": { "type": "text", "analyzer": "my_phonetic_analyzer" } } }
Index a document with a JSON field called
phonetic
with content that should be passed through the phonetic analyzer:POST /test/doc { "phonetic": "black leather ottoman" }
Perform a
match
search for the term “ottoman”. However, instead of spelling the term correctly, misspell the word such that the misspelled word is phonetically similar:POST /_search { "query": { "match": { "phonetic": "otomen" } } }
The phonetic analysis plugin should be able to recognize that “otomen” and “ottoman” are phonetically similar, and return the correct result.
Geoip Processor Plugin
When indexing documents such as log files, some fields may contain IP addresses. The Geoip plugin can process IP addresses in order to enrich documents with location data.
Install the plugin:
sudo /usr/share/elasticsearch/bin/elasticsearch-plugin install ingest-geoip
Restart Elasticsearch:
sudo systemctl restart elasticsearch
Confirm the plugin is installed by checking the API:
GET /_cat/plugins
As with the ingest-attachment
pipeline plugin, the ingest-geoip
plugin is used as a processor within an ingest pipeline. The
Geoip plugin documentation outlines the available settings when creating processors within a pipeline.
Create a pipeline called
parse-ip
which consumes an IP address from a field calledip
and creates regional information underneath the default field (geoip
):PUT /_ingest/pipeline/parse-ip { "description" : "Geolocate an IP address", "processors" : [ { "geoip" : { "field" : "ip" } } ] }
Add a mapping to the index to indicate that the
ip
field should be stored as an IP address in the underlying storage engine:POST /test/_mapping/doc { "properties": { "ip": { "type": "ip" } } }
Index a document with the
ip
field set to an example address and pass thepipeline=parse-ip
in the request to use theparse-ip
pipeline to process the document:PUT /test/doc/ipexample?pipeline=parse-ip { "ip": "8.8.8.8" }
Retrieve the document to view the fields created by the pipeline:
GET /test/doc/ipexample
The response should include a
geoip
JSON key with fields such ascity_name
derived from the source IP address. The plugin should correctly determine that the IP address is located in California.
User Agent Processor Plugin
A common use case for Elasticsearch is to index log files. By parsing certain fields from web server access logs, requests can be more effectively searched by response code, URL, and more. The ingest-user-agent
adds the capability to parse the contents of the User-Agent
header of web requests to more precisely create additional fields identifying the client platform that performed the request.
Install the plugin:
sudo /usr/share/elasticsearch/bin/elasticsearch-plugin install ingest-user-agent
Restart Elasticsearch:
sudo systemctl restart elasticsearch
Confirm the plugin is installed:
GET /_cat/plugins
Create an ingest pipeline which instructs Elasticsearch which field to reference when parsing a user agent string:
PUT /_ingest/pipeline/useragent { "description" : "Parse User-Agent content", "processors" : [ { "user_agent" : { "field" : "agent" } } ] }
Index a document with the
agent
field set to an exampleUser-Agent
string:PUT /test/doc/agentexample?pipeline=useragent { "agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.113 Safari/537.36" }
Retrieve the document to view the fields created by the pipeline:
GET /test/doc/agentexample
The indexed document will include user data underneath the
user_agent
JSON key. The User Agent plugin understands a variety ofUser-Agent
strings and can reliably parseUser-Agent
fields from access logs generated by web servers such as Apache and NGINX.
Conclusion
The plugins covered in this tutorial are a small subset of those available from Elastic or written by third parties. For additional resources regarding Elasticsearch and plugin use, see the links in the More Information section below.
More Information
You may wish to consult the following resources for additional information on this topic. While these are provided in the hope that they will be useful, please note that we cannot vouch for the accuracy or timeliness of externally hosted materials.
This page was originally published on