This is part #5 in a (never ending?) series of articles on Indexing and Searching the ISFDB.org data using Solr.When we left last time, I had basic support for multiple types of documents: Title documents (that were fairly well fleshed out) and Authors documents (that were not). In this installment, I do some housekeeping, and improve my modeling of Author related data.(If you are interested in following along at home, you can checkout the code from github. I’m starting at the blog_4 tag, and as the article progresses I’ll link to specific commits where I changed things, leading up to the blog_5 tag containing the end result of this article.)
Step #0: Cleaning Up My Messes
As I mentioned last time, I took a short cut when adding my Author documents by reusing several fields in the schema.xml that existed for the Title documents. The first thing I wanted to do before adding more fields to my Author documents, was to cleanup the fields I already had, so there was a nice clear separation, and I could feel comfortable tweaking fields w/o risk of breaking my other doc_type.So I started by renaming all of my fields in the DIH and schema.xml config files so that Title and Author entities have distinct field names except in the rare situations where the concept behind the fields really is the same for both types of entity…- imdb URLs
- wikipedia URLs
- All Author docs were getting a “seriesnum” field
- All Title docs were getting an “author_id” field
<field ... />
syntax to declare every field I wanted in order to prevent these implicit fields from springing into existence, but before I got that far it occurred to me that if I used an alternate ‘name’ in my SQL that did not exist in my schema.xml, DIH would (probably) ignore it (similar to how it happily deals with multiple values for single value fields). It worked like a charm.