An edition of Apache Solr Beginner's Guide (2013)

Apache Solr Beginner's Guide

My Reading Lists:

Create a new list


Buy this book

Last edited by Drini
October 25, 2025 | History
An edition of Apache Solr Beginner's Guide (2013)

Apache Solr Beginner's Guide

Written in a friendly, example-driven format, the book includes plenty of step-by-step instructions and examples that are designed to help you get started with Apache Solr. This book is an entry level text into the wonderful world of Apache Solr. The book will center around a couple of simple projects such as setting up Solr and all the stuff that comes with customizing the Solr schema and configuration. This book is for developers looking to start using Apache Solr who are stuck or intimidated by the difficulty of setting it up and using it. For anyone wanting to embed a search engine in their.

Publish Date
Publisher
Packt Publishing
Language
English
Pages
324

Buy this book

Previews available in: English

Edition Availability
Cover of: Apache Solr Beginner's Guide
Apache Solr Beginner's Guide
2017, CreateSpace Independent Publishing Platform
in English
Cover of: Apache Solr Beginner's Guide
Apache Solr Beginner's Guide
Dec 26, 2013, Packt Publishing
paperback in English
Cover of: Apache Solr Beginner's Guide
Apache Solr Beginner's Guide
2013, Packt Publishing, Limited
in English

Add another edition?

Book Details


Table of Contents

Preface
Page 1
Chapter 1. Getting Ready with the Essentials
Page 7
Understanding Solr
Page 7
Learning the powerful aspects of Solr
Page 8
Working with Java installation
Page 11
Downloading and installing Java
Page 12
Configuring CLASSPATH and PATH variables for Java
Page 12
Installing and testing Solr
Page 12
Time for action - starting Solr for the first time
Page 14
Taking a glance at the Solr interface
Page 15
Time for action - posting some example data
Page 17
Time for action - testing Solr with cURL
Page 19
Who uses Solr?
Page 20
Resources on Solr
Page 20
How will we use Solr?
Page 21
Summary
Page 22
Chapter 2. Indexing with Local PDF Files
Page 23
Understanding and using an index
Page 23
Posting example documents to the first Solr core
Page 24
Analyzing the elements we need in Solr core
Page 25
Time for action - configuring Solr Home and Solr core discovery
Page 26
Knowing the legacy solr.xml format
Page 27
Time for action - writing a simple solrconfig.xml file
Page 28
Time for action - writing a simple schema.xml file
Page 29
Time for action - starting the new core
Page 32
Time for action - defining an example document
Page 34
Time for action - indexing an example document with cURL
Page 35
Executing the first search on the new core
Page 37
Adding documents to the index from the web UI
Page 39
Time for action - updating an existing document
Page 41
Time for action - cleaning an index
Page 41
Creating an index prototype from PDF files
Page 42
Time for action - defining the schema.xml file with only dynamic fields and tokenization
Page 42
Time for action - writing a simple solrconfig.xml file with an update handler
Page 43
Testing the PDF file core with dummy data and an example query
Page 44
Defining a new tokenized field for fulltext
Page 45
Time for action - using Tika and cURL to extract text from PDFs
Page 46
Using cURL to index some PDF data
Page 48
Time for action - finding copies of the same files with deduplication
Page 48
Time for action - looking inside an index with SimpleTextCodec
Page 50
Understanding the structure of an inverted index
Page 52
Understanding how optimization affects the segments of an index
Page 53
Writing the full configuration for our PDF index example
Page 53
Writing the solrconfig.xml file
Page 53
Writing the schema.xml file
Page 54
Summarizing some easy recipes for the maintenance of an index
Page 55
Summary
Page 58
Chapter 3. Indexing Example Data from DBpedia - Paintings
Page 59
Harvesting paintings' data from DBpedia
Page 59
Analyzing the entities that we want to index
Page 61
Analyzing the first entity - Painting
Page 62
Writing Solr core configurations for the first tests
Page 63
Time for action - defining the basic solrconfig.xml file
Page 63
Looking at the differences between commits and soft commits
Page 65
Time for action - defining the simple schema.xml file
Page 65
Introducing analyzers, tokenizers, and filters
Page 67
Thinking fields for atomic updates
Page 68
Indexing a test entity with JSON
Page 68
Understanding the update chain
Page 70
Using the atomic update
Page 70
Understanding how optimistic concurrency works
Page 72
Time for action - listing all the fields with the CSV output
Page 73
Defining a new Solr core for our Painting entity
Page 73
Time for action - refactoring the schema.xml file for the paintings core by introducing tokenization and stop words
Page 74
Using common field attributes for different use cases
Page 76
Testing the paintings schema
Page 76
Collecting the paintings data from DBpedia
Page 77
Downloading data using the DBpedia SPARQL endpoint
Page 77
Creating Solr documents for example data
Page 79
Indexing example data
Page 79
Testing our paintings core
Page 79
Time for action - looking at a field using the Schema browser in the web interface
Page 80
Time for action - searching the new data in the paintings core
Page 80
Using the Solr web interface for simple maintenance tasks
Page 83
Summary
Page 85
Chapter 4. Searching the Example Data
Page 87
Looking at Solr's standard query parameters
Page 87
Adding a timestamp field for tracking the last modified time
Page 88
Sending Solr's query parameters over HTTP
Page 89
Testing HTTP parameters on browsers
Page 89
Choosing a format for the output
Page 90
Time for action - searching for all documents with pagination
Page 90
Time for action - projecting fields with fl
Page 92
Introducing pseudo-fields and DocTransformers
Page 94
Adding a constant field using transformers
Page 94
Time for action - adding a custom DocTransformer to hide empty fields in the results
Page 95
Looking at core parameters for queries
Page 96
Using the Lucene query parser with defType
Page 98
Time for action - searching for terms with a Boolean query
Page 98
Time for action - using q.op for the default Boolean operator
Page 99
Time for action - selecting documents with the filter query
Page 100
Time for action - searching for incomplete terms with the wildcard query
Page 101
Time for action - using the Boost options
Page 102
Understanding the basic Lucene score
Page 102
Time for action - searching for similar terms with fuzzy search
Page 103
Time for action - writing a simple phrase query example
Page 104
Time for action - playing with range queries
Page 104
Time for action - sorting documents with the sort parameter
Page 105
Playing with the request
Page 106
Time for action - adding a default parameter to a handler
Page 106
Playing with the response
Page 108
Summarizing the parameters that affect result presentation
Page 109
Analyzing response format
Page 110
Time for action - enabling XSLT Response Writer with Luke
Page 111
Listing all fields names with CSV output
Page 112
Listing all field details for a core
Page 112
Exploring Solr for Open Data publishing
Page 113
Publishing results in CSV format
Page 113
Publishing results with an RSS feed
Page 113
Good resources on Solr Query Syntax
Page 114
Summary
Page 115
Chapter 5. Extending Search
Page 117
Looking at different search parsers - Lucene, Dismax, and Edismax
Page 117
Starting from the previous core definition
Page 118
Time for action - inspecting results using the stats and debug components
Page 118
Looking at Lucene and Solr query parsers
Page 121
Time for action - debugging a query with the Lucene parser
Page 122
Time for action - debugging a query with the Dismax parser
Page 124
Using an Edismax default handler
Page 125
Time for action - executing a nested Edismax query
Page 127
A short list of search components
Page 128
Adding the blooming filter and real-time Get
Page 130
Time for action - executing a simple pseudo-join query
Page 131
Highlighting results to improve the search experience
Page 132
Time for action - generating highlighted snippets over a term
Page 132
Some idea about geolocalization with Solr
Page 134
Time for action - creating a repository of cities
Page 135
Playing more with spatial search
Page 137
Looking at the new Solr 4 spatial features - from points to polygons
Page 137
Time for action - expanding the original data with coordinates during the update process
Page 139
Performing editorial correction on boosting
Page 141
Introducing the spellcheck component
Page 142
Time for action - playing with spellchecks
Page 143
Using a file to spellcheck against a list of controlled words
Page 147
Collecting some hints for spellchecking analysis
Page 148
Summary
Page 150
Chapter 6. Using Faceted Search - from Searching to Finding
Page 151
Exploring documents suggestion and matching with faceted search
Page 151
Time for action - prototyping an auto-suggester with facets
Page 152
Time for action - creating wordclouds on facets to view and analyze data
Page 153
Thinking about faceted search and findability
Page 155
Faceting for narrowing searches and exploring data
Page 156
Time for action - defining facets over enumerated fields
Page 158
Performing data normalization for the keyword field during the update phase
Page 160
Reading more about Solr faceting parameters
Page 161
Time for action - finding interesting topics using faceting on tokenized fields with a filter query
Page 161
Using filter queries for caching filters
Page 164
Time for action - finding interesting subjects using a facet query
Page 166
Time for action - using range queries and facet range queries
Page 168
Time for action - using a hierarchical facet (pivot)
Page 169
Introducing group and field collapsing
Page 170
Time for action - grouping results
Page 171
Playing with terms
Page 173
Time for action - playing with a term suggester
Page 173
Thinking about term vectors and similarity
Page 176
Moving to semantics with vector space models
Page 177
Looking at the next step - customizing similarity
Page 177
Time for action - having a look at the term vectors
Page 178
Reading about functions
Page 180
Introducing the More Like This component and recommendations
Page 181
Time for action - obtaining similar documents by More Like This
Page 182
Adopting a More Like This handler
Page 183
Summary
Page 184
Chapter 7. Working with Multiple Entities, Multicores, and Distributed Search
Page 185
Working with multiple entities
Page 185
Time for action - searching for cities using multiple core joins
Page 186
Preparing example data for multiple entities
Page 188
Downloading files for multiple entities
Page 189
Generating Solr documents
Page 189
Playing with joins on multicores (a core for every entity)
Page 190
Using sharding for distributed search
Page 190
Time for action - playing with sharding (distributed search)
Page 191
Time for action - finding a document from any shard
Page 193
Collecting some ideas on schemaless versus normalization
Page 195
Creating a single denormalized index
Page 196
Adding a field to track entity type
Page 196
Analyzing, designing, and refactoring our domain
Page 197
Using document clustering as a domain analysis tool
Page 197
Managing index replication
Page 201
Clustering Solr for distributed search using SolrCloud
Page 202
Taking a journey from single core to SolrCloud
Page 202
Understanding why we need Zookeeper
Page 203
Time for action - testing SolrCloud and Zookeeper locally
Page 203
Looking at the suggested configurations for SolrCloud
Page 205
Changing the schema.xml file
Page 205
Changing the solrconfig.xml file
Page 205
Knowing the pros and cons of SolrCloud
Page 206
Summary
Page 208
Chapter 8. Indexing External Data sources
Page 209
Stepping further into the real world
Page 209
Collecting example data from the Web Gallery of Art site
Page 213
Time for action - indexing data from a database (for example, a blog or an e-commerce website)
Page 215
Time for action - handling sub-entities (for example, joins on complex data)
Page 220
Time for action - indexing incrementally using delta imports
Page 222
Time for action - indexing CSV (for example, open data)
Page 224
Time for action - importing Solr XML document files
Page 225
Importing data from another Solr instance
Page 228
Indexing emails
Page 228
Time for action - indexing rich documents (for example, PDF)
Page 229
Adding more consideration about tuning
Page 230
Understanding Java Virtual Machine, threads, and Solr
Page 231
Choosing the correct directory for implementation
Page 231
Adopting Solr cache
Page 232
Time for action - indexing artist data from Tate Gallery and DBpedia
Page 233
Using DataImportHandler
Page 236
Summary
Page 238
Chapter 9. Introducing Customizations
Page 239
Looking at the Solr customizations
Page 239
Adding some more details to the core discovery
Page 240
Playing with specific languages
Page 241
Time for action - detecting language with Tika and LangDetect
Page 242
Introducing stemming for query expansion
Page 243
Time for action - adopting a stemmer
Page 245
Testing language analysis with JUnit and Scala
Page 246
Writing new Solr plugins
Page 247
Introducing Solr plugin structure and lifecycle
Page 248
Implementing interfaces for obtaining information
Page 248
Following an example plugin lifecycle
Page 248
Time for action - writing a new ResponseWriter plugin with the Thymeleaf library
Page 250
Using Maven for development
Page 254
Time for action - integrating Stanford NER for Named Entity extraction
Page 256
Pointing ideas for Solr's customizations
Page 259
Summary
Page 263
Appendix. Solr Clients and Integrations
Page 265
Introducing SolrJ - an embedded or remote Solr client using the Java (JVM) API
Page 265
Time for action - playing with an embedded Solr instance
Page 266
Choosing between an embedded or remote Solr instance
Page 268
Time for action - playing with an external HttpSolrServer
Page 269
Time for action - using Bean Scripting Framework and JavaScript
Page 271
Jahia CMS
Page 272
Magnolia CMS
Page 272
Alfresco DMS and CMS
Page 273
Liferay
Page 273
Broadleaf
Page 273
Apache Jena
Page 274
Solr Groovy or the Grails plugin
Page 274
Solr scala
Page 274
Spring data
Page 274
Writing Solr clients and integrations outside JVM
Page 275
JavaScript
Page 276
Taking a glance at ajax-solr, solrstrap, facetview, and jcloud
Page 277
Ruby
Page 281
Python
Page 281
C# and .NET
Page 281
PHP
Page 282
Drupal
Page 282
WordPress
Page 282
Magento e-commerce
Page 282
Platforms for analyzing, manipulating, and enhancing text
Page 283
Hydra
Page 283
UIMA
Page 283
Apache Stanbol
Page 283
Carrot2
Page 284
VuFind
Page 284
Summary
Page 285
Pop Quiz Answers
Page 287
Index
Page 297

Classifications

Library of Congress
TK5105.884, QA76.9.F5

The Physical Object

Format
paperback
Number of pages
324

Edition Identifiers

Open Library
OL37756397M
Internet Archive
apachesolrbeginn0000sera
ISBN 10
1782162526
ISBN 13
9781782162520
OCLC/WorldCat
867316995, 870467607

Work Identifiers

Work ID
OL27686333W

Community Reviews (0)

No community reviews have been submitted for this work.

Lists

History

Download catalog record: RDF / JSON
October 25, 2025 Edited by Drini import existing book
October 9, 2024 Edited by MARC Bot import existing book
October 8, 2023 Edited by raybb Bulk tagging works
December 17, 2022 Edited by ImportBot import existing book
March 14, 2022 Created by ImportBot import new book