Nepomuk and KDE 4.9.1

| View Comments

Last week 4.9.1 was tagged, and it should release any day now. A large part of my time was spent bug fixing and stabilizing some of the most user visible features. This blog post highlights some of the more user visible changes.

File Watcher

Apart from minor fixes such as extra checks for buffer overflows, and kernel version checks, we have two major improvements:

Memory Leak

The Nepomuk File monitoring service had a serious memory leak in 4.9.0, which was unfortunately not caught in the beta or RC testing period. Depending on the number of directories present, the file watcher service would consume a good 2 GB of your RAM. Fortunately, many distributions patched their own tarballs before releasing to the public.

Reindexing events

Another bug was that lots of files were being unnecessarily reindexed when they were opened in write mode even though no changes had been made. With 4.9.1 we actually check if the modification time of the file has changed, even when the file notification system tells us otherwise.

This should fix most of the problems of files unnecessarily being reindexed.

File Indexer

The File Indexer has also gone through a number of fixes -

Do not check for new files on startup

One of the most annoying things I found about file indexing was the initial check for new files on startup. Even though this check runs silently in the background, it was very annoying. Specially for people who run Nepomuk all the time.

/images/scanning.png

Now, we only check if the strigi version has changed, and only then do we check for new files / check for any files that could not be indexed the last time. If someone wants to forcibly check for new files, they can do so with a dbus call -

$ qdbus org.kde.nepomuk.services.nepomukfileindexer /nepomukfileindexer updateAllFolders false

Or wait till 4.10, when I add an option to do so in the KCM.

Secondary Indexer

This patch technically made it into 4.9.0, but I never got the chance to blog about it. I've introduced a SimpleIndexer which serves as backup when the strigi indexers provide incorrect data. This way instead of having no data about the file we at least have the basic information such as filename, url and mimtype. I have plans for a more concrete solution for 4.10, but that's for another blog post.

Nasty Deadlock

Another annoying bug that was not caught with 4.9.0 was a deadlock that rendered the file indexer service useless.

/images/nepomuk-deadlock.jpg

This only happened during the first run of Nepomuk, and led to some unfortunate publicity. In in the end it was a very simple fix once the proper backtrace had been provided.

Strigi Analyzers

Another big change which probably has a large impact on indexing times is the blacklisting of certain file indexing (strigi) plugins. Currently by default, we only blacklist the SHA1 hash generator. With it blacklisted, we do not (hopefully) need to read the full contents of the file. This results in a noticeable performance improvement for large files.

Queries

File Queries

/images/dolphin_places_panel.png

One of my favorite features of Dolphin is their Places Panel with all of those defaults - Documents, Images, Audio and Video files. Unfortunately, those defaults were normal Nepomuk queries and not file queries. With 4.9.1, they have been changed to file queries which lets them access a number of optimizations on our end, and allows them to benefit from my minor optimizations in file queries.

Query Updates

Most applications which currently use Nepomuk utilize the QueryService in order to run queries. The Query Service can then provide updates on those queries if there is some change in the results - It used to do so by running ALL open queries when any data in Nepomuk changes. This was not a good approach, as it obviously does not scale.

With 4.9.1, we are now using some simple heuristics and the Resource Watcher to only re-run queries when affected data changes. So indexing a file will not result in the updating of all custom Nepomuk magic folders.

Nepomuk Core Library

And finally, the have been a number of bug fixes and crash fixes in the NepomukCore library, but considering that they aren't really that user visible, I'm not going to talk much about them.

Miscellaneous

There have been a large number of improvements in the Nepomuk Testing Suite, and more tests have been added. Most of the tests passed, but it still serves as a good way of checking regressions.

Along with that there have been some fixes with Removeable Media handling.

Soprano

Recently I blogged about certain major query optimizations, which would only be public in 4.10. This major optimization was done by skipping a large part of the Soprano code. After the blog post, we identified some of the bottle necks, and optimized them. The most impressive result was a reduction of one query from 6 seconds to 1 second.

Again, this isn't really a part of KDE 4.9.1, but it will be a part of Soprano 2.8.1. So when it releases, make sure you upgrade.

That's all for this release :)

Read and Post Comments

Nepomuk Widgets Repository

| View Comments

With KDE 4.9, we introduced a new repository called nepomuk-core. This contained a combination of kdelibs/nepomuk and kde-runtime/nepomuk. It was created because of the API freeze present in kdelibs. Considering that most of the client libraries are thin wrappers over the runtime components, it made sense to combine them in one repository..

In order to be compatibile with kdelibs, the new library is installed with the Nepomuk2 namespace.

Now with KDE 4.10 we are going to have another new repository nepomuk-widgets. This repository contains the remaining GUI parts of kdelibs/nepomuk that were not moved to nepomuk-core.

Port your Applications

With this repo, we have all of the earlier functionality covered. With 4.10, the Nepomuk libraries will be deprecated. So, port your applications to Nepomuk2. I've updated the wiki with a short script that should do take care of most of the changes. You will have to update your CMake files on your own.

Advantages

The kdelibs/nepomuk libraries are in a critical bug-fix state only. That being said, some of the most important classes over there do not have any kind of tests. With Nepomuk2, we have decent test coverage, and active development. Plus, you get access to a number of new asynchronous APIs.

Read and Post Comments

Nepomuk Technical Documentation

| View Comments

I have been working a lot to improve Nepomuk's technical documentation on the techbase. And it now finally reached the point where I think it covers most of the major aspects of Nepomuk. If you're using Nepomuk in your applications, you should read it.

/images/nepomuk.png

It's a little hard for me to objectively state if I've explained stuff properly considering that I have been so involved in the development. If you feel some section is lacking or you're having trouble understanding it, please contact me. I'll be happy to update it.

And with this, I can (somewhat) tick off one item from the Nepomuk BOF todo list.

Read and Post Comments

Faster Nepomuk Queries

| View Comments

Nepomuk has a very decentralized architecture where the different components exist as different processes. They are all variants of the same executable - nepomukservicestub. This servicestub loads appropriate service plugin. The main reason for doing this was stability. If one of the components crashes, then it doesn't take all the other components with it.

Unfortunately this architecture doesn't hold very well when the different components need to communicate with one another. In that case they need to use complex methods such as dbus or local sockets. Another problem is the increased memory consumption cause each process has its own internal cache (Nepomuk stuff) and other KDE specific stuff.

/images/query-storage-separate.png

If you ignore file handling in Nepomuk, we have two main services -

  • Storage Service
  • Query Service.

The Storage Service is responsible for managing the ontologies, initializing virtuoso, and other data management functions. The QueryService exists for caching queries and running them in a separate thread.

Now the Query Service obviously need to access the virtuoso database, and for that it needs to go through the storage service. This communication happens through a local socket. The same socket which all other applications use to access Nepomuk.

Last week, I finally merged the query service into the storage service.

/images/query-storage-merged.png

I was aiming for a small memory decrease, and a slight performance upgrade on the queries. Boy, was I wrong! The additional local socket seems to have been a huge bottleneck.

Here are some benchmarks listing about 12,500 resources.

/images/queryservice-benchmarks.png

There are still many more performance upgrades that can be done, but this seemed like a good place to start :)

Read and Post Comments

Nepomuk Without Files

| View Comments

Most people assume that if they switch off file indexing in Nepomuk, then all the nepomuk file services will get disabled. This is however not the case. Nepomuk consists of two services which are used to deal with files -

  • Nepomuk File Watcher
  • Nepomuk File Indexer

The Nepomuk File Indexer is responsible for calling the strigi plugins to index the files, whereas the FileWatch service is a general service that monitors file move, creation and deletion events. Even when the File Indexer does not exist, files may have metadata attached to them - Tags, Rating and Comments. We need the File Watcher to update our database whenever the url of a file changes.

The File Watcher internally uses a kernel API for file monitoring - inotify. This API, while quite easy to use, does not allow us to recursively watch directories, and more importantly, does not provide file move events unless we are watching both the source and the destination directory.

We need file move events in order to track a file's url. This results in us having to create inotify watches for every single directory in your $HOME folder. This causes a large disk load on startup and is the cause of one of the critical bugs in Nepomuk. And we have no solution, until the kernel provides us with a better API.

Anyway, Nepomuk is being used in KDE PIM and Telepathy (development version), and none of those use cases have anything to do with files. It doesn't make sense to subjugate others to pay the price of the file watcher, when they are not doing anything related to files. So, with that in mind, please add the following lines to your nepomukserverrc, if you do not care about files at all -

[Service-nepomukfilewatch]
autostart=false

Warning

With the Nepomuk FileWatch Service disabled, you'll still be able to tag and rate your files, but these annotations will be lost if you move or rename the file.

Read and Post Comments

« Previous Page -- Next Page »