Haystack
Haystack is a Python library that provides modular search for Django. It features an API that provides support for different search back ends such as Elasticsearch, Whoosh, Xapian, and Solr.
Elasticsearch
Elasticsearch is a popular Lucene search engine capable of full-text search, and it’s developed in Java.
Google search uses the same approach of indexing their data, and that’s why it’s very easy to retrieve any information with just a few keywords, as shown below.
Install Django Haystack and Elasticsearch
The first step is to get Elasticsearch up and running locally on your machine. Elasticsearch requires Java, so you need to have Java installed on your machine.
We are going to follow the instructions from the Elasticsearch site.
Download the Elasticsearch 1.4.5 tar as follows:
curl -L -O https://download.elastic.co/elasticsearch/elasticsearch/elasticsearch-1.4.5.tar.gz
Extract it as follows:
tar -xvf elasticsearch-1.4.5.tar.gz
It will then create a batch of files and folders in your current directory. We then go into the bin directory as follows:
cd elasticsearch-1.4.5/bin
Start Elasticsearch as follows.
./elasticsearch
To confirm if it has installed successfully, go to http://127.0.0.1:9200/, and you should see something like this.
{ "name" : "W3nGEDa", "cluster_name" : "elasticsearch", "cluster_uuid" : "ygpVDczbR4OI5sx5lzo0-w", "version" : { "number" : "5.6.3", "build_hash" : "1a2f265", "build_date" : "2017-10-06T20:33:39.012Z", "build_snapshot" : false, "lucene_version" : "6.6.1" }, "tagline" : "You Know, for Search" }
Ensure you also have haystack installed.
pip install django-haystack
Let’s create our Django project. Our project will be able to index all the customers in a bank, making it easy to search and retrieve data using just a few search terms.
django-admin startproject Bank
This command creates files that provide configurations for Django projects.
Let’s create an app for customers.
cd Bank python manage.py startapp customers
settings.py
Configurations
In order to use Elasticsearch to index our searchable content, we’ll need to define a back-end setting for haystack in our project’s settings.py
file. We are going to use Elasticsearch as our back end.
HAYSTACK_CONNECTIONS
is a required setting and should look like this:
HAYSTACK_CONNECTIONS = { 'default': { 'ENGINE': 'haystack.backends.elasticsearch_backend.ElasticsearchSearchEngine', 'URL': 'http://127.0.0.1:9200/', 'INDEX_NAME': 'haystack', }, }
Within the settings.py
, we are also going to add haystack and customers to the list of installed apps
.
INSTALLED_APPS = [ 'django.contrib.admin', 'django.contrib.auth', 'django.contrib.contenttypes', 'django.contrib.sessions', 'django.contrib.messages', 'django.contrib.staticfiles', 'rest_framework', 'haystack', 'customer' ]
Create Models
Let’s create a model for Customers. In customers/models.
py
, add the following code.
from __future__ import unicode_literals from django.db import models # Create your models here. customer_type = ( ("Active", "Active"), ("Inactive", "Inactive") ) class Customer(models.Model): id = models.IntegerField(primary_key=True) first_name = models.CharField(max_length=50, null=False, blank=True) last_name = models.CharField( max_length=50, null=False, blank=True) other_names = models.CharField(max_length=50, default=" ") email = models.EmailField(max_length=100, null=True, blank=True) phone = models.CharField(max_length=30, null=False, blank=True) balance = models.IntegerField(default="0") customer_status = models.CharField( max_length=100, choices=customer_type, default="Active") address = models.CharField( max_length=50, null=False, blank=False) def save(self, *args, **kwargs): return super(Customer, self).save(*args, **kwargs) def __unicode__(self): return "{}:{}".format(self.first_name, self.last_name)
Register your Customer
model in admin.py
like this:
from django.contrib import admin from .models import Customer # Register your models here. admin.site.register(Customer)
Create Database and Super User
Apply your migrations and create an admin account.
python manage.py migrate python manage.py createsuperuser
Run your server and navigate to http://localhost:8000/admin/. You should now be able to see your Customer model there. Go ahead and add new customers in the admin.
Indexing Data
To index our models, we begin by creating a SearchIndex
. SearchIndex
objects determine what data should be placed in the search index. Each type of model must have a unique searchIndex
.
SearchIndex
objects are the way haystack determines what data should be placed in the search index and handles the flow of data in. To build a SearchIndex
, we are going to inherit from the indexes.SearchIndex
and indexes.Indexable
, define the fields we want to store our data with, and define a get_model
method.
Let’s create the CustomerIndex
to correspond to our Customer
modeling. Create a file search_indexes.py
in the customers app directory, and add the following code.
from .models import Customer from haystack import indexes class CustomerIndex(indexes.SearchIndex, indexes.Indexable): text = indexes.EdgeNgramField(document=True, use_template=True) first_name = indexes.CharField(model_attr='first_name') last_name = indexes.CharField(model_attr='last_name') other_names = indexes.CharField(model_attr='other_names') email = indexes.CharField(model_attr='email', default=" ") phone = indexes.CharField(model_attr='phone', default=" ") balance = indexes.IntegerField(model_attr='balance', default="0") customer_status = indexes.CharField(model_attr='customer_status') address = indexes.CharField(model_attr='address', default=" ") def get_model(self): return Customer def index_queryset(self, using=None): return self.get_model().objects.all()
The EdgeNgramField
is a field in the haystack SearchIndex
that prevents incorrect matches when parts of two different words are mashed together.
It allows us to use the autocomplete
feature to conduct queries. We will use autocomplete when we start querying our data.
document=True
indicates the primary field for searching within. Additionally, the use_template=True
in the text
field allows us to use a data template to build the document that will be indexed.
Let’s create the template inside our customers template directory. Inside search/indexes/customers/customers_text.txt
, add the following:
{{object.first_name}} {{object.last_name}} {{object.other_names}}
Reindex Data
Now that our data is in the database, it’s time to put it in our search index. To do this, simply run ./manage.py rebuild_index
. You’ll get totals of how many models were processed and placed in the index.
Indexing 20 customers
Alternatively, you can use RealtimeSignalProcessor
, which automatically handles updates/deletes for you. To use it, add the following in the settings.py
file.
HAYSTACK_SIGNAL_PROCESSOR = 'haystack.signals.RealtimeSignalProcessor'
Querying Data
We are going to use a search template and the Haystack API to query data.
Search Template
Add the haystack urls to your URLconf.
url(r'^search/', include('haystack.urls')),
Let’s create our search template. In templates/search.html
, add the following code.
{% block head %} {% endblock %} {% block navbar %} {% endblock %} {% block content %}{% endblock %}
The page.object_list
is a list of SearchResult
objects that allows us to get the individual model objects, for example, result.first_name
.
Your complete project structure should look something like this:
Now run server, go to 127.0.0.1:8000/search/
, and do a search as shown below.
A search of Albert
will give results of all customers with the name Albert
. If no customer has the name Albert, then the query will give empty results. Feel free to play around with your own data.
Haystack API
Haystack has a SearchQuerySet
class that is designed to make it easy and consistent to perform searches and iterate results. Much of the SearchQuerySet
API is familiar with Django’s ORM QuerySet
.
In customers/views.py
, add the following code:
from django.shortcuts import render from rest_framework.decorators import ( api_view, renderer_classes, ) from .models import Customer from haystack.query import SearchQuerySet from rest_framework.response import Response # Create your views here. @api_view(['POST']) def search_customer(request): name = request.data['name'] customer = SearchQuerySet().models(Customer).autocomplete( first_name__startswith=name) searched_data = [] for i in customer: all_results = {"first_name": i.first_name, "last_name": i.last_name, "balance": i.balance, "status": i.customer_status, } searched_data.append(all_results) return Response(searched_data)
autocomplete
is a shortcut method to perform an autocomplete search. It must be run against fields that are either EdgeNgramField
or NgramField
.
In the above Queryset
, we are using the contains
method to filter our search to retrieve only the results that contain our defined characters. For example, Al
will only retrieve the details of the customers which contain Al
. Note that the results will only come from fields that have been defined in the customer_text.txt file
.
Apart from the contains
Field Lookup, there are other fields available for performing queries, including:
- content
- contains
- exact
- gt
- gte
- lt
- lte
- in
- startswith
- endswith
- range
- fuzzy
Conclusion
A huge amount of data is produced at any given moment in social media, health, shopping, and other sectors. Much of this data is unstructured and scattered. Elasticsearch can be used to process and analyze this data into a form that can be understood and consumed.
Elasticsearch has also been used extensively for content search, data analysis, and queries. For more information, visit the Haystack and Elasticsearch sites.
Powered by WPeMatico