Skip to content

Manticore or Sphinx SQLite 3 ODBC import with indextool and unixodbc (with support for delta updates)

I love SQLite and use it in every programming language i know, python, golang, php, i wanted to use this database i made to import it into Manticore/Sphinx but couldn’t find a lot of information about it. So that’s why i want to share it with you!

I’ve tested this setup on several Debian and Ubuntu distributions and it always works like a charm.

First install the needed library’s to enable ODBC and the communication between ODBC and SQLite on Linux.

sudo apt install unixodbc libsqlite3-dev libsqliteodbc 

Now we have the needed libraries we can go to the configuration part (/etc/manticoresearch/manticoresearch.conf) or /etc/sphinx/sphinx.conf for sphinx.

We’ll create an extra database to support delta updates, sph_counter.db, the same which i couldn’t get working for delta updates like described on the Manticore documentation pages.

First create the sph_counter.db

sqlite3 /full/path/to/sph_counter.db

Then create the schema needed for our delta updates, basically we insert a value for a counter (counter_id) and insert the number of documents we have already indexed, you could also change max_doc_id to some field where you have stores something like the update time for a record so you will only delta-update updated documents.

CREATE TABLE sph_counter
(
    counter_id INTEGER PRIMARY KEY NOT NULL,
    max_doc_id INTEGER NOT NULL
);

Create the ODBC source for your Manticore/Sphinx index in manticoresearch.conf or sphinx.conf

source srcodbc
{
        type = odbc
        odbc_dsn=Driver=SQLite3;Database=/full/path/to/my_index.db
        sql_column_buffers =name=2048
        sql_query_pre = REPLACE INTO sph_counter SELECT 1, MAX(id) FROM documents;
        sql_query_pre = attach database '/full/path/to/sph_counter.db' as sph;
        sql_query_post_index = attach database '/full/path/to/sph_counter.db' as sph;
        sql_query_post_index = REPLACE INTO sph_counter ( counter_id, max_doc_id )  VALUES ( '1', (SELECT MAX(last_update) FROM my_index))
        sql_query_post_index = detach database sph;

}

The sql_query_pre selects the max document id from Manticore/Sphinx so we know where to start and attaches the database to our SQLite database so we can update our last indexed document afterwards.

The sql_query_post_index inserts the max_doc_id from the SQLite database into the sph_counter table so we know where to start next time.

Afterwards create your index definition like usual:

index my_index
{
    stopwords      = /etc/manticoresearch/stopwords.txt
    source         = srcodbc
    path           = /var/lib/manticore/data/my_index
    html_strip     = 0
}

Run the indexer:

indextool --rotate my_index

Afterwards if you’ll run the same command it will do a delta update because we know the max_doc_id we’d insert into the sph_counter table.

If you want to convert the plain index into a Realtime index execute the following mysql command in the sphinxQL interface:

ATTACH INDEX my_index TO RTINDEX rtindex WITH TRUNCATE

Your plain index will be converted to a realtime index, as it will be converted and not be inserted your plain index will be lost, you have to reset the sph_counter max_doc_id and index it again if you want to use the plain-index again.

Comments 3

  • I do agree with all the ideas you’ve presented in your post. They are very convincing and will certainly work. Still, the posts are very short for newbies. Could you please extend them a little from next time? Thanks for the post.

  • Hi there, I found your blog via Google while looking for a related topic, your site came up, it looks good. I have bookmarked it in my google bookmarks.

  • Thanks for sharing your thoughts. I truly appreciate your efforts
    and I will be waiting for your next post thank you once again.

Leave a Reply

Your email address will not be published.

PHP Code Snippets Powered By : XYZScripts.com