NEPOMUK INDEXING ARCHITECTURE

Over the last 6 months, while working for Blue Systems, Nepomuk has undergone a number of changes. The most public and noticeable change has been the major refactoring of the file indexer. One large part of this has been the migration from Strigi. The other large part is the introduction of 2 phase indexing. With 4.9 and earlier, the file indexing service used to just have one queue, whose speed could be controlled.

NEPOMUK WITHOUT STRIGI

Strigi has always been a large part of Nepomuk. In fact a lot of users still do not understand the difference between the two. It’s quite common to see bug reports saying mentioning “Strigi/Nepomuk”. Lots of blog posts do the same. Strigi consists of a number of different parts. In Nepomuk we just used to use libstreams and libstreamanalyzer. These were pure C++ libraries. The great thing about Strigi is that it is based on streams, instead of files.

NEPOMUK AND KDE 4.9.1

Last week 4.9.1 was tagged, and it should release any day now. A large part of my time was spent bug fixing and stabilizing some of the most user visible features. This blog post highlights some of the more user visible changes. File Watcher Apart from minor fixes such as extra checks for buffer overflows, and kernel version checks, we have two major improvements: Memory Leak The Nepomuk File monitoring service had a serious memory leak in 4.

NEPOMUK WIDGETS REPOSITORY

With KDE 4.9, we introduced a new repository called nepomuk-core. This contained a combination of kdelibs/nepomuk and kde-runtime/nepomuk. It was created because of the API freeze present in kdelibs. Considering that most of the client libraries are thin wrappers over the runtime components, it made sense to combine them in one repository.. In order to be compatibile with kdelibs, the new library is installed with the Nepomuk2 namespace. Now with KDE 4.

NEPOMUK TECHNICAL DOCUMENTATION

I have been working a lot to improve Nepomuk’s technical documentation on the techbase. And it now finally reached the point where I think it covers most of the major aspects of Nepomuk. If you’re using Nepomuk in your applications, you should read it. It’s a little hard for me to objectively state if I’ve explained stuff properly considering that I have been so involved in the development. If you feel some section is lacking or you’re having trouble understanding it, please contact me.

FASTER NEPOMUK QUERIES

Nepomuk has a very decentralized architecture where the different components exist as different processes. They are all variants of the same executable - nepomukservicestub. This servicestub loads appropriate service plugin. The main reason for doing this was stability. If one of the components crashes, then it doesn’t take all the other components with it. Unfortunately this architecture doesn’t hold very well when the different components need to communicate with one another.

NEPOMUK WITHOUT FILES

Most people assume that if they switch off file indexing in Nepomuk, then all the nepomuk file services will get disabled. This is however not the case. Nepomuk consists of two services which are used to deal with files - Nepomuk File Watcher Nepomuk File Indexer The Nepomuk File Indexer is responsible for calling the strigi plugins to index the files, whereas the FileWatch service is a general service that monitors file move, creation and deletion events.

BETTER UNIT TESTS

A little while ago, before Akademy, I started implementing the Shared Hash Memory table, so as to improve Nepomuk’s architecture. Architectural designs really interest me. I decided to be smart about it, if I was going to refactor some of the main classes, there was a very high chance I would break something. The code is quite complex. I needed a way to make sure I was on the right track.

A FEW UPDATES

Just some quick updates - I graduated from college about a month back I’m now working on Nepomuk full time courtesy of Blue Systems I’m now a KDE e.V member Yaye! :)

SHARED MEMORY HASH TABLE

For the last month I’ve been working on a hash table which is stored in shared memory and can thus easily be used across applications. This is ideal for simple caches of data that reside in multiple applications. My specific use case was the Nepomuk Resource class, which is a glorified cache of key value pairs and uses a hash table. A considerable amount of effort has gone into making sure that each application’s Resource classes are consistent with the other applications.