Chapter 14-Getting started with Elasticsearch in Python

Posted Jun 16, 20207 min read

My Elasticsearch articles are being updated gradually, welcome to follow
0A. About Elasticsearch and example applications
00. Solr vs. ElasticSearch
01. What can ElasticSearch do?
02. Elastic Stack function introduction
03. How to install and set up Elasticsearch API
04. If indexing through the head plugin of elasticsearch_CRUD operation
05. Introduction to the use of multiple instances of Elasticsearch and head plugin
06. How does Elasticsearch work when indexing documents?
07. Mapping method in Elasticsearch-concise tutorial
08. Analysis and Analyzer Application Method in Elasticsearch
09. Build a custom analyzer in Elasticsearch
10. Kibana Science-as an Elasticsearhc development tool
11. Elasticsearch query method
12. Elasticsearch full text query
13. Elasticsearch query-term level query
14. Getting started with Elasticsearch in Python

In addition to getting started with Elasticsearch, I highly recommend ElasticSearch Building Manual To you, I really want to get started guide manual.

In this article, I will discuss Elasticsearch and how to integrate it with different Python applications.

What is ElasticSearch?

ElasticSearch(ES) is a distributed and highly available open source search engine built on Apache Lucene. This is an open source built in Java, so it can be used on many platforms. You store unstructured data in JSON format, which also makes it a NoSQL database. Therefore, unlike other NoSQL databases, ES also provides search engine functions and other related functions.

ElasticSearch use cases

You can use ES for multiple purposes, a few of which are provided below:

The website you are running provides a lot of dynamic content. Whether it is an e-commerce website or a blog. By implementing ES, you can not only provide a powerful search engine for your web application, but also provide native auto-complete functionality in the application.

You can ingest different kinds of log data, which can then be used to find trends and statistics.

Set up and run
The easiest way to install ElasticSearch is to download and run the executable file. You must ensure that you are using Java 7 or later.
After downloading, unzip and run its binary file.

elasticsearch-6.2.4 bin/elasticsearch

There will be a lot of text in the scrolling window. If you see something similar to the following, the situation has been resolved.

[2018-05-27T17:36:11,744][INFO][oehnNetty4HttpServerTransport][c6hEGv4]publish_address {127.0.0.1:9200}, bound_addresses {[::1]:9200}, {127.0.0.1:9200}

However, since seeing is believing, http://localhost :9200 then visit the URL in the browser or through cURL, the following content should be very welcome to you.

{
"name":"c6hEGv4",
"cluster_name":"elasticsearch",
"cluster_uuid":"HkRyTYXvSkGvkvHX2Q1-oQ",
"version":{

"number":"6.2.4",
"build_hash":"ccec39f",
"build_date":"2018-04-12T20:37:28.497551Z",
"build_snapshot":false,
"lucene_version":"7.2.1",
"minimum_wire_compatibility_version":"5.6.0",
"minimum_index_compatibility_version":"5.0.0"

},
"tagline":"You Know, for Search"
}

Now, before I continue to use Python to access Elastic Search, let's do some basic things. As I mentioned, ES provides a REST API that we will use to perform different tasks.
Basic example
The first thing you need to do is create an index. Everything is stored in the index. The RDBMS is equivalent to an index being a database, so don t confuse it with the typical indexing concepts you learn in RDBMS. I am using PostMan to run a REST API.

If the operation is successful, you will see a similar response below.

{

"Acknowledged":true,
"Shards_acknowledged":true,
"Index":" company"

}

Therefore, we created a database called company. In other words, we created an index called company. If you access through a browser, you will see something similar to the following:

http://localhost:9200/company
{
  "Company":{
    "Aliases":{

    },
    "Mappings":{

    },
    "Settings":{
      "Index":{
        "Creation_date":"1527638692850",
        "Number_of_shards":" 5",
        "Number_of_replicas":" 1",
        "Uuid":"RnT-gXISSxKchyowgjZOkQ",
        "Version":{
          "Created":"6020499"
        },
        "Provided_name":"company"
      }
    }
  }
}

Wait a moment for mappings, we will discuss later. It's really just a framework for creating documents. creation_date is self-explanatory. The number_of_shards tells the number index that will make this data partition. It doesn't make any sense to keep all the data on a single disk. If you are running a cluster with multiple Elastic nodes, the entire data will be split between them. In short, if there are 5 shards, the entire data can be used on 5 shards, and the ElasticSearch cluster can handle requests from any of its nodes.
The copy talks about the mirroring of the data. If you are familiar with the master-slave concept, then this should not be new knowledge for you. You can learn more about basic ES concepts here.
The indexed cURL version is single-line.

 elasticsearch-6.2.4 curl -X PUT local host:9200/company
{"Acknowledged":true, "shards_acknowledged":true, "index":"company"}%

You can also perform index creation and record insertion tasks at once. All you have to do is pass the records in JSON format. In PostMan, you can look like this.

Make sure to set Content-Type to application/json
If company does not exist, it will create an index named here, and then create a new type named employee here. The type is actually the ES version of the table in the RDBMS.
The above request will output the following JSON structure.

{
    "_index":"company",
    "_type":"employees",
    "_id":"1",
    "_version":1,
    "result":"created",
    "_shards":{
        "total":2,
        "successful":1,
        "failed":0
    },
    "_seq_no":0,
    "_primary_term":1
}

You pass /1 as the recorded ID. Although not necessary. All you have to do is set field 1 with value for _id. Then, you pass the data in JSON format, which will eventually be inserted as a new record or document. If you http://localhost :9200/company/employees/1 visit from a browser, you will see the following.

{"_index":"company","_type":"employees","_id":"1","_version":1,"found":true,"_source":{
    "name":"Adnan Siddiqi",
    "occupation":"Consultant"
}
 }

You can see the actual records and metadata. If you want, you can change the request to http://localhost :9200/company/employees/1/_source, it will only output the JSON structure of the record.
The cURL version is:

{
  "name":"c6hEGv4",
  "cluster_name":"elasticsearch",
  "cluster_uuid":"HkRyTYXvSkGvkvHX2Q1-oQ",
  "version":{
    "number":"6.2.4",
    "build_hash":"ccec39f",
    "build_date":"2018-04-12T20:37:28.497551Z",
    "build_snapshot":false,
    "lucene_version":"7.2.1",
    "minimum_wire_compatibility_version":"5.6.0",
    "minimum_index_compatibility_version":"5.0.0"
  },
  "tagline":"You Know, for Search"
}

What if you want to update the record? Well, it's very simple. All you have to do is change the JSON record. As follows:

It will produce the following output:

{
  "company":{
    "aliases":{

    },
    "mappings":{

    },
    "settings":{
      "index":{
        "creation_date":"1527638692850",
        "number_of_shards":"5",
        "number_of_replicas":"1",
        "uuid":"RnT-gXISSxKchyowgjZOkQ",
        "version":{
          "created":"6020499"
        },
        "provided_name":"company"
      }
    }
  }
}

Please note that the _result field is now set to updated instead of created
Of course, you can also delete certain records.

And, if you are going crazy or your girlfriend has abandoned you, you can burn the whole world by running curl -XDELETE localhost:9200/_all from the command line.
Let's do some basic searches. If you run http://localhost :9200/company/employees/_search?q=adnan, it will search all fields employees under the type and return related records.

{
    "_index":"company",
    "_type":"employees",
    "_id":"1",
    "_version":1,
    "result":"created",
    "_shards":{
        "total":2,
        "successful":1,
        "failed":0
    },
    "_seq_no":0,
    "_primary_term":1
}

The max_score field indicates the relevance of the record, that is, the highest score of the record. If there are multiple records, then it will be another number.

You can also limit the search criteria to a certain field by passing the field name. Therefore, http://localhost :9200/company/employees/_search?q=name:Adnan will only search in the field of the name document. It is actually equivalent to SQLSELECT * from table where name='Adnan'
I only introduced basic examples. ES can do many things, but I will let you explore it further by reading the documentation, and then switch to using Python to access ES.

Access ElasticSearch in Python
To be honest, ES's REST API is good enough, you can use the requests library to perform all tasks. However, you can use the Python library with ElasticSearch to focus on the main task without worrying about how to create the request.
Install it via pip, then you can access it in a Python program.
pip install elasticsearch
To ensure that it is installed correctly, run the following basic code snippet from the command line:

elasticsearch-6.2.4 python
Python 3.6.4 | Anaconda Custom(64-bit)|(default, January 16, 2018, 12:04:33)
Use [GCC 4.2.1 compatible Clang 4.0.1(tag/RELEASE_401/final)]on darwin and enter the following
"Help", "Copyright", "Credit" or "License" for more information.

elasticsearch-6.2.4 python
Python 3.6.4 | Anaconda custom(64-bit)|(default, Jan 16 2018, 12:04:33)
[GCC 4.2.1 Compatible Clang 4.0.1(tags/RELEASE_401/final)]on darwin
Type "help", "copyright", "credits" or "license" for more information.

>>> from elasticsearch import Elasticsearch
>>> es = Elasticsearch([{'host':'localhost','port':9200}])
>>> es
<Elasticsearch([{'host':'localhost','port':9200}])>

Web search and Elasticsearch
Let's discuss some practical use cases using Elasticsearch. The purpose is to access online recipes and store them in Elasticsearch for search and analysis purposes. We will first grab the data from Allrecipes and store it in ES. If it is ES, we will also create a strict Schema or mapping to ensure that the data is indexed in the correct format and type. I just pull the list of salad recipes. Let's start!
dedicate data

in conclusion
Elasticsearch is a powerful tool that can help you search for existing or new applications by providing powerful functions to return the most accurate result set. I just introduced the main points. Read the documentation and become familiar with this powerful tool. In particular, the fuzzy search function is excellent. If there is a chance, I will introduce Query DSL in a future article.