It looks like you're offline.

An edition of Apache Solr Beginner's Guide (2013)

Apache Solr Beginner's Guide

by Alfredo Serafini

Cover of: Apache Solr Beginner's Guide by Alfredo Serafini, Alfredo Serafini

Preview

Borrow

My Reading Lists:

Use this Work

Create a new list

Check nearby libraries

WorldCat

Buy this book

Last edited by Drini

October 25, 2025 | History

Edit

An edition of Apache Solr Beginner's Guide (2013)

Apache Solr Beginner's Guide

by Alfredo Serafini

Written in a friendly, example-driven format, the book includes plenty of step-by-step instructions and examples that are designed to help you get started with Apache Solr. This book is an entry level text into the wonderful world of Apache Solr. The book will center around a couple of simple projects such as setting up Solr and all the stuff that comes with customizing the Solr schema and configuration. This book is for developers looking to start using Apache Solr who are stuck or intimidated by the difficulty of setting it up and using it. For anyone wanting to embed a search engine in their.

Publish Date

Dec 26, 2013

Publisher

Packt Publishing

Language

English

Pages

324

Check nearby libraries

WorldCat

Buy this book

Previews available in: English

Edition	Availability
1 Apache Solr Beginner's Guide 2017, CreateSpace Independent Publishing Platform in English 1548535370 9781548535377	zzzz Locate
2 Apache Solr Beginner's Guide Dec 26, 2013, Packt Publishing paperback in English 1782162526 9781782162520	aaaa Borrow Listen Locate
3 Apache Solr Beginner's Guide 2013, Packt Publishing, Limited in English 1782162534 9781782162537	zzzz Locate

Add another edition?

Book Details

Table of Contents

Chapter 1. Getting Ready with the Essentials

Understanding Solr

Learning the powerful aspects of Solr

Working with Java installation

Downloading and installing Java

Configuring CLASSPATH and PATH variables for Java

Installing and testing Solr

Time for action - starting Solr for the first time

Taking a glance at the Solr interface

Time for action - posting some example data

Time for action - testing Solr with cURL

Resources on Solr

How will we use Solr?

Chapter 2. Indexing with Local PDF Files

Understanding and using an index

Posting example documents to the first Solr core

Analyzing the elements we need in Solr core

Time for action - configuring Solr Home and Solr core discovery

Knowing the legacy solr.xml format

Time for action - writing a simple solrconfig.xml file

Time for action - writing a simple schema.xml file

Time for action - starting the new core

Time for action - defining an example document

Time for action - indexing an example document with cURL

Executing the first search on the new core

Adding documents to the index from the web UI

Time for action - updating an existing document

Time for action - cleaning an index

Creating an index prototype from PDF files

Time for action - defining the schema.xml file with only dynamic fields and tokenization

Time for action - writing a simple solrconfig.xml file with an update handler

Testing the PDF file core with dummy data and an example query

Defining a new tokenized field for fulltext

Time for action - using Tika and cURL to extract text from PDFs

Using cURL to index some PDF data

Time for action - finding copies of the same files with deduplication

Time for action - looking inside an index with SimpleTextCodec

Understanding the structure of an inverted index

Understanding how optimization affects the segments of an index

Writing the full configuration for our PDF index example

Writing the solrconfig.xml file

Writing the schema.xml file

Summarizing some easy recipes for the maintenance of an index

Chapter 3. Indexing Example Data from DBpedia - Paintings

Harvesting paintings' data from DBpedia

Analyzing the entities that we want to index

Analyzing the first entity - Painting

Writing Solr core configurations for the first tests

Time for action - defining the basic solrconfig.xml file

Looking at the differences between commits and soft commits

Time for action - defining the simple schema.xml file

Introducing analyzers, tokenizers, and filters

Thinking fields for atomic updates

Indexing a test entity with JSON

Understanding the update chain

Using the atomic update

Understanding how optimistic concurrency works

Time for action - listing all the fields with the CSV output

Defining a new Solr core for our Painting entity

Time for action - refactoring the schema.xml file for the paintings core by introducing tokenization and stop words

Using common field attributes for different use cases

Testing the paintings schema

Collecting the paintings data from DBpedia

Downloading data using the DBpedia SPARQL endpoint

Creating Solr documents for example data

Indexing example data

Testing our paintings core

Time for action - looking at a field using the Schema browser in the web interface

Time for action - searching the new data in the paintings core

Using the Solr web interface for simple maintenance tasks

Chapter 4. Searching the Example Data

Looking at Solr's standard query parameters

Adding a timestamp field for tracking the last modified time

Sending Solr's query parameters over HTTP

Testing HTTP parameters on browsers

Choosing a format for the output

Time for action - searching for all documents with pagination

Time for action - projecting fields with fl

Introducing pseudo-fields and DocTransformers

Adding a constant field using transformers

Time for action - adding a custom DocTransformer to hide empty fields in the results

Looking at core parameters for queries

Using the Lucene query parser with defType

Time for action - searching for terms with a Boolean query

Time for action - using q.op for the default Boolean operator

Time for action - selecting documents with the filter query

Time for action - searching for incomplete terms with the wildcard query

Time for action - using the Boost options

Understanding the basic Lucene score

Time for action - searching for similar terms with fuzzy search

Time for action - writing a simple phrase query example

Time for action - playing with range queries

Time for action - sorting documents with the sort parameter

Playing with the request

Time for action - adding a default parameter to a handler

Playing with the response

Summarizing the parameters that affect result presentation

Analyzing response format

Time for action - enabling XSLT Response Writer with Luke

Listing all fields names with CSV output

Listing all field details for a core

Exploring Solr for Open Data publishing

Publishing results in CSV format

Publishing results with an RSS feed

Good resources on Solr Query Syntax

Chapter 5. Extending Search

Looking at different search parsers - Lucene, Dismax, and Edismax

Starting from the previous core definition

Time for action - inspecting results using the stats and debug components

Looking at Lucene and Solr query parsers

Time for action - debugging a query with the Lucene parser

Time for action - debugging a query with the Dismax parser

Using an Edismax default handler

Time for action - executing a nested Edismax query

A short list of search components

Adding the blooming filter and real-time Get

Time for action - executing a simple pseudo-join query

Highlighting results to improve the search experience

Time for action - generating highlighted snippets over a term

Some idea about geolocalization with Solr

Time for action - creating a repository of cities

Playing more with spatial search

Looking at the new Solr 4 spatial features - from points to polygons

Time for action - expanding the original data with coordinates during the update process

Performing editorial correction on boosting

Introducing the spellcheck component

Time for action - playing with spellchecks

Using a file to spellcheck against a list of controlled words

Collecting some hints for spellchecking analysis

Chapter 6. Using Faceted Search - from Searching to Finding

Exploring documents suggestion and matching with faceted search

Time for action - prototyping an auto-suggester with facets

Time for action - creating wordclouds on facets to view and analyze data

Thinking about faceted search and findability

Faceting for narrowing searches and exploring data

Time for action - defining facets over enumerated fields

Performing data normalization for the keyword field during the update phase

Reading more about Solr faceting parameters

Time for action - finding interesting topics using faceting on tokenized fields with a filter query

Using filter queries for caching filters

Time for action - finding interesting subjects using a facet query

Time for action - using range queries and facet range queries

Time for action - using a hierarchical facet (pivot)

Introducing group and field collapsing

Time for action - grouping results

Playing with terms

Time for action - playing with a term suggester

Thinking about term vectors and similarity

Moving to semantics with vector space models

Looking at the next step - customizing similarity

Time for action - having a look at the term vectors

Reading about functions

Introducing the More Like This component and recommendations

Time for action - obtaining similar documents by More Like This

Adopting a More Like This handler

Chapter 7. Working with Multiple Entities, Multicores, and Distributed Search

Working with multiple entities

Time for action - searching for cities using multiple core joins

Preparing example data for multiple entities

Downloading files for multiple entities

Generating Solr documents

Playing with joins on multicores (a core for every entity)

Using sharding for distributed search

Time for action - playing with sharding (distributed search)

Time for action - finding a document from any shard

Collecting some ideas on schemaless versus normalization

Creating a single denormalized index

Adding a field to track entity type

Analyzing, designing, and refactoring our domain

Using document clustering as a domain analysis tool

Managing index replication

Clustering Solr for distributed search using SolrCloud

Taking a journey from single core to SolrCloud

Understanding why we need Zookeeper

Time for action - testing SolrCloud and Zookeeper locally

Looking at the suggested configurations for SolrCloud

Changing the schema.xml file

Changing the solrconfig.xml file

Knowing the pros and cons of SolrCloud

Chapter 8. Indexing External Data sources

Stepping further into the real world

Collecting example data from the Web Gallery of Art site

Time for action - indexing data from a database (for example, a blog or an e-commerce website)

Time for action - handling sub-entities (for example, joins on complex data)

Time for action - indexing incrementally using delta imports

Time for action - indexing CSV (for example, open data)

Time for action - importing Solr XML document files

Importing data from another Solr instance

Indexing emails

Time for action - indexing rich documents (for example, PDF)

Adding more consideration about tuning

Understanding Java Virtual Machine, threads, and Solr

Choosing the correct directory for implementation

Adopting Solr cache

Time for action - indexing artist data from Tate Gallery and DBpedia

Using DataImportHandler

Chapter 9. Introducing Customizations

Looking at the Solr customizations

Adding some more details to the core discovery

Playing with specific languages

Time for action - detecting language with Tika and LangDetect

Introducing stemming for query expansion

Time for action - adopting a stemmer

Testing language analysis with JUnit and Scala

Writing new Solr plugins

Introducing Solr plugin structure and lifecycle

Implementing interfaces for obtaining information

Following an example plugin lifecycle

Time for action - writing a new ResponseWriter plugin with the Thymeleaf library

Using Maven for development

Time for action - integrating Stanford NER for Named Entity extraction

Pointing ideas for Solr's customizations

Appendix. Solr Clients and Integrations

Introducing SolrJ - an embedded or remote Solr client using the Java (JVM) API

Time for action - playing with an embedded Solr instance

Choosing between an embedded or remote Solr instance

Time for action - playing with an external HttpSolrServer

Time for action - using Bean Scripting Framework and JavaScript

Alfresco DMS and CMS

Solr Groovy or the Grails plugin

Writing Solr clients and integrations outside JVM

Taking a glance at ajax-solr, solrstrap, facetview, and jcloud

Magento e-commerce

Platforms for analyzing, manipulating, and enhancing text

Pop Quiz Answers

Classifications

Library of Congress: TK5105.884, QA76.9.F5

The Physical Object

Format: paperback
Number of pages: 324

Edition Identifiers

Open Library: OL37756397M
Internet Archive: apachesolrbeginn0000sera
ISBN 10: 1782162526
ISBN 13: 9781782162520
OCLC/WorldCat: 867316995, 870467607

Work Identifiers

Work ID: OL27686333W

Source records

Community Reviews (0)

No community reviews have been submitted for this work.

Lists

History

Created March 14, 2022
6 revisions

Download catalog record: RDF / JSON

October 25, 2025	Edited by Drini	import existing book
October 9, 2024	Edited by MARC Bot	import existing book
October 8, 2023	Edited by raybb	Bulk tagging works
December 17, 2022	Edited by ImportBot	import existing book
March 14, 2022	Created by ImportBot	import new book