<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
     xmlns:content="http://purl.org/rss/1.0/modules/content/"
     xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
     xmlns:atom="http://www.w3.org/2005/Atom"
     xmlns:dc="http://purl.org/dc/elements/1.1/"
     xmlns:wfw="http://wellformedweb.org/CommentAPI/"
     >
  <channel>
    <title>Random Thoughts and Musings</title>
    <link>http://vhanda.in/blog</link>
    <description>Thoughts about KDE, Nepomuk and Open source</description>
    <pubDate>Fri, 01 Feb 2013 15:21:04 GMT</pubDate>
    <generator>Blogofile</generator>
    <sy:updatePeriod>hourly</sy:updatePeriod>
    <sy:updateFrequency>1</sy:updateFrequency>
    <item>
      <title>What's new with Nepomuk 4.10</title>
      <link>http://vhanda.in/blog/2013/01/what-new-with-nepomuk-4-10</link>
      <pubDate>Fri, 01 Feb 2013 20:36:55 IST</pubDate>
      <category><![CDATA[uncategorized]]></category>
      <guid>http://vhanda.in/blog/2013/01/what-new-with-nepomuk-4-10</guid>
      <description>What's new with Nepomuk 4.10</description>
      <content:encoded><![CDATA[<div class="document">
<img alt="/images/nepomuk.png" class="align-center" src="/images/nepomuk.png" />
<p>I've blogged about some of the more prominent changes in this new Nepomuk release. I thought it would be a good idea to document all the changes, most of which I haven't publicly blogged about.</p>
<div class="section" id="file-indexing">
<h1>File Indexing</h1>
<p>As the release announcement has been saying, the file indexer has undergone the maximum number of changes.</p>
<div class="section" id="new-double-queue-architecture">
<h2>New Double Queue Architecture</h2>
<p>We've <a class="reference external" href="http://vhanda.in/blog/2013/01/nepomuk-indexing-architecture/">split the working</a> of the indexer into two parts - The first basic indexing and second full file indexing. The basic indexing quickly indexes the basic information about the file such as the filename and mimetype. This allows us to always at least answer simple queries. The other queue, which is only run when the user is idle, extracts the full information about the file.</p>
</div>
<div class="section" id="new-file-indexer">
<h2>New File Indexer</h2>
<p>We've had some problems with Strigi earlier. With 4.10, we have finally decided to <a class="reference external" href="http://vhanda.in/blog/2012/11/nepomuk-without-strigi/">release our own solution</a>. Our solution is arguably technologically inferior, but it's more maintainable and, for now, provides a better user experience.</p>
</div>
<div class="section" id="mimetype-filtering">
<h2>Mimetype Filtering</h2>
<p>One of the advantages of moving to this new file indexing architecture is that mimetypes are a very important part. All of the file indexing plugins use mimetypes to identify which types of files they can index. With this, we decided to allow the user to control the type of files that are indexed.</p>
<img alt="/images/nepomuk-mimetype.jpg" class="align-center" src="/images/nepomuk-mimetype.jpg" />
<p>By default, source code is now no longer indexed. Common stuff like Documents, Images, Audio and Videos are.</p>
</div>
</div>
<div class="section" id="kioslave-changes">
<h1>KioSlave changes</h1>
<p>Till the 4.9 release, the kioslave code hadn't changed much. With 4.9.1, we managed to optimize some of the code. The 4.10 release however takes this to an entirely different level.</p>
<div class="section" id="massive-optimizations">
<h2>Massive Optimizations</h2>
<p>The 'nepomuksearch' tagging slave could initially show both non-file and file data. This means that it would also occasionally show contacts, albums and other details. Selecting any of those would result in another search for resources related to that contact. For this release, we decided to optimize for the most common use case of listing files.</p>
<p>The 'nepomuksearch' kioslave, and all other nepomuk kioslaves, now no longer show any result which does not have a URL. This coupled with a LOT of other optimizations, has now yielded a super fast kioslave which can display thousands of results in under a second.</p>
<p>There is also some <a class="reference external" href="http://userbase.kde.org/Nepomuk/kioslaves/search">interesting userbase documentation</a> about custom queries on the nepomuksearch kioslave.</p>
</div>
<div class="section" id="tagging-kioslave">
<h2>Tagging KioSlave</h2>
<p>As previously stated, we are also introding a new <a class="reference external" href="http://vhanda.in/blog/2013/01/nepomuk-tags-kioslave/">tagging kioslave</a>. This slave allows you to easily manage you Nepomuk tags, and browse files based on the different tags it contains.</p>
<img alt="/images/nepomuk-tags-browsing.png" class="align-center" src="/images/nepomuk-tags-browsing.png" />
</div>
</div>
<div class="section" id="file-metadatawidget">
<h1>File MetadataWidget</h1>
<img alt="/images/nepomuk-filemetadatawidget.png" class="align-center" src="/images/nepomuk-filemetadatawidget.png" />
<p>One of the largest part of the Dolphin Information Panel was the <a class="reference external" href="http://api.kde.org/4.10-api/kdelibs-apidocs/kio/html/classKFileMetaDataWidget.html">KFileMetadataWidget</a> which was provided by kdelibs/kio. This widget was one of the last parts of Dolphin that still used Nepomuk1. Since kdelibs was frozen, we couldn't port it to Nepomuk2. Thus emerged the Nepomuk2::FileMetadataWidget in <a class="reference external" href="http://vhanda.in/blog/2012/08/nepomuk-widgets-repository/">nepomuk-widgets</a>.</p>
<p>The KFileMetadataWidget historically fetched all the data in another process. This was done because Strigi was a little unreliable. With KDE Workspaces 4.10, we are no longer using Strigi in Nepomuk. This means the widget now uses the nepomukindexer, to extract the data. It also no longer uses this multi-process architecture when loading the Nepomuk data. This result in a massive performance improvement cause we can rely on Nepomuk cache in Dolphin, instead of recreating it each time.</p>
<p>In terms of appearance, the widget has become a little more uniform, and by default only shows the properties that really matter.</p>
</div>
<div class="section" id="improved-removable-media-handling">
<h1>Improved Removable Media Handling</h1>
<p>Nepomuk has for quite some time supported indexing of removable media handling. However, it didn't always work that great. From a design point of view, the solution was great and extremely robust. This however, came at a steep cost for the rest of Nepomuk. Every other query was affected by these features, and not in a small way. For some simple tests of basic indexing, it made of difference of around 20%.</p>
<p>With this new release, we have gone to a simpler solution which has a lighter performance cost. We have also removed the &quot;Automatic Invalid File Metadata Cleaner&quot; which removed the metadata for any file it could not access. The client code now always checks if the file can be accessed before displaying it to the user.</p>
</div>
<div class="section" id="nepomuk-backup-changes">
<h1>Nepomuk Backup Changes</h1>
<p>With KDE Workspaces 4.6, my Google Summer of Code Project, Nepomuk Backup, was finally merged. It was a very ambitious project which attempted to synchronize, backup and restore data in a non-destructible manner. In the end, it was just a little bit too complex. Large parts of the synchronization code, eventually migrated into the data feeding code which is now used by anyone pushing data into Nepomuk. So, it wasn't a complete loss.</p>
<p>With this new release, I finally got around to throwing away most of the complex code, and implementing a very simple and reliable backup solution. This new method does not require a separate service to be running, and therefore consumes less memory. Additionally, we also have some basic unit tests to ensure that the backups are restored properly!</p>
<p>Please keep in mind that this only backups up the non-destructible data. This does not include the file or email index information. If you want that to be backed up, you're better off just making a copy of the database file.</p>
</div>
<div class="section" id="nepomuk-cleaner">
<h1>Nepomuk Cleaner</h1>
<img alt="/images/nepomuk-cleaner.png" class="align-center" src="/images/nepomuk-cleaner.png" />
<p>The Nepomuk Cleaner originated from a series of scripts I was writing to clear up my own database. It eventually occurred to me that other people might suffer from the same problem. The scripts were eventually combined into a cohesive form, and released. The application is very simple right now, but that will change in future releases. I even contemplated not releasing it for 4.10, but it clearly provides some value, even if it doesn't look that great.</p>
</div>
<div class="section" id="other-changes">
<h1>Other Changes</h1>
<p>Surprisingly, I didn't want to include many new features this releases. I was trying to focus more on stabilization. Over the <a class="reference external" href="https://bugs.kde.org/weekly-bug-summary.cgi?tops=70&amp;days=180">last 6 months</a>, A total of 246 bugs have been resolved, out of which 188 were reported within the last 6 months. This seems like a good improvement to me.</p>
<p>Apart from these simple changes there have been a number of optimizations all across Nepomuk and Soprano. Nepomuk should be running faster and better than ever before. In some cases we have even seen an over 200% increase in performance.</p>
<p>Anyway, Enjoy the new release! :)</p>
</div>
</div>
]]></content:encoded>
    </item>
    <item>
      <title>Nepomuk Tags Kioslave</title>
      <link>http://vhanda.in/blog/2013/01/nepomuk-tags-kioslave</link>
      <pubDate>Thu, 31 Jan 2013 19:36:55 IST</pubDate>
      <category><![CDATA[uncategorized]]></category>
      <guid>http://vhanda.in/blog/2013/01/nepomuk-tags-kioslave</guid>
      <description>Nepomuk Tags Kioslave</description>
      <content:encoded><![CDATA[<div class="document">
<p>Nepomuk has long required a convenient way of managing tags. I've previously tried this with a simple <a class="reference external" href="http://vhanda.in/blog/2012/01/nepomuk-tag-manager/">Tag Managing Application</a>, but that wasn't something that we wanted to ship. For the KDE Workspace 4.10 release, we are releasing a Nepomuk Tags kioslave.</p>
<div class="section" id="listing-tags">
<h1>Listing Tags</h1>
<p>The kioslave provides a very convenient way of listing all the tags. You can even rename and delete tags, just like you would for any other folder.</p>
<img alt="/images/nepomuk-tags-root.png" class="align-center" src="/images/nepomuk-tags-root.png" />
</div>
<div class="section" id="browsing-files">
<h1>Browsing Files</h1>
<p>Nepomuk has always provided users a way to browse tags, but it was only one tag at a time. This seemed fairly limiting. Once could browse by more tags, but then you would have had to write the <a class="reference external" href="http://userbase.kde.org/Nepomuk/kioslaves/search">query</a> yourself.</p>
<img alt="/images/nepomuk-tags-browsing.png" class="align-center" src="/images/nepomuk-tags-browsing.png" />
<p>With this <a class="reference external" href="http://userbase.kde.org/Nepomuk/kioslaves/tags">kioslave</a>, you can finally browse the files based on the tags, and then filter the search even more by selecting more tags.</p>
</div>
<div class="section" id="applying-tags">
<h1>Applying Tags</h1>
<p>The kioslave also supports adding of tags in bulk. Just drag and drop (or copy) the files into the tagged folder, and the appropriate tags will be applied.</p>
</div>
</div>
]]></content:encoded>
    </item>
    <item>
      <title>Nepomuk Cleaner</title>
      <link>http://vhanda.in/blog/2013/01/nepomuk-cleaner</link>
      <pubDate>Wed, 30 Jan 2013 12:36:55 IST</pubDate>
      <category><![CDATA[uncategorized]]></category>
      <guid>http://vhanda.in/blog/2013/01/nepomuk-cleaner</guid>
      <description>Nepomuk Cleaner</description>
      <content:encoded><![CDATA[<div class="document">
<p>Nepomuk has a unique problem of maintaining an RDF store. Unlike traditional SQL based stores, RDF offers a very loose schema, which is a HUGE advantage. Unfortunately all of the current RDF stores do not support any form of schema enforcement. It's up to the client code to make sure that the data being pushed is valid.</p>
<p>This has resulted in a number of problems such as strings being stored where an integer should go.</p>
<p>With the KDE Workspace 4.7 release, we started employing our own form of schema enforcement in the Nepomuk Storage Service, but the old incorrect data still remains.  Also, as Nepomuk has evolved as a project, we have found better ways to store data. Since the schemas are so loose, we could easily store both the old and the new data without any problems on the database level. This obviously results in more complex client code which has to handle both legacy and new data.</p>
<p>For this release, we decided to clean up the code to a certain extent and stop supporting some of the legacy data. We also decided to ship a very basic application called the &quot;<strong>Nepomuk Cleaner</strong>&quot;.</p>
<img alt="/images/nepomuk-cleaner.png" class="align-center" src="/images/nepomuk-cleaner.png" />
<p>This application is responsible to port any legacy data, clear up incorrect data, and merge duplicate data. We recommend that all users run it at least once. It will result in a performance upgrade of all areas of Nepomuk, including a significant impact in the indexing speed of emails.</p>
<p>With the 4.11 release, we're planning to improve the interface, add more cleaning jobs, and make running this application mandatory. That way we can safely remove all the legacy code paths.</p>
</div>
]]></content:encoded>
    </item>
    <item>
      <title>Nepomuk Indexing Architecture</title>
      <link>http://vhanda.in/blog/2013/01/nepomuk-indexing-architecture</link>
      <pubDate>Tue, 29 Jan 2013 19:36:55 IST</pubDate>
      <category><![CDATA[uncategorized]]></category>
      <guid>http://vhanda.in/blog/2013/01/nepomuk-indexing-architecture</guid>
      <description>Nepomuk Indexing Architecture</description>
      <content:encoded><![CDATA[<div class="document">
<p>Over the last 6 months, while working for <a class="reference external" href="http://www.blue-systems.com/">Blue Systems</a>, Nepomuk has undergone a number of changes. The most public and noticeable change has been the major refactoring of the file indexer. One large part of this has been the <a class="reference external" href="http://vhanda.in/blog/2012/11/nepomuk-without-strigi/">migration from Strigi</a>. The other large part is the introduction of 2 phase indexing.</p>
<p>With 4.9 and earlier, the file indexing service used to just have one queue, whose speed could be controlled. This queue was filled on startup by comparing the mtime of the file with the one stored in the database. This would involve scanning through all the indexed folders. Once the scan was complete it would also listen to the file watcher to be notified when a file is modified or created.</p>
<p>This architecture had some shortcomings -</p>
<ul class="simple">
<li>Indexing each file is a time consuming process, and it involves extracting and pushing large amounts of data in Nepomuk.</li>
<li>Since this process was slow and we did not want to annoy the users, artificial delays were introduced which were changed based on if the user is idle.</li>
<li>The entire indexing process was suspended when on battery</li>
<li>Faulty files which cannot be indexed do not have any information stored, and could not even be searched by filename</li>
</ul>
<p>With this new release, we have split the indexing into 2 parts - Basic Indexing and File Indexing. The basic indexing just extracts the stat information and mimetype of the file. Whereas, the file indexing actually extracts data from the file.</p>
<p>This basic indexing is always enabled, and is very fast. It can process around 10-20 files per second. Also, it consumes very little cpu. Extracting this basic information first allows us to search on the basis of type, file and enabled the timeline kioslave to work properly.</p>
<p>The file indexing is the relatively heavy process that is only run when the user is idle, <a class="reference external" href="http://userbase.kde.org/Nepomuk/FileIndexer#Changing_the_default_behavior">by default</a>.</p>
<p>This two phase architecture allows us to still index all the files, while providing a relatively light burden to the user. It also allows us to provide finer control than a simple on/off switch. For example - Now when on battery, file indexing is disabled, but the simple indexing still continues.</p>
<p>This new approach will also allow us to provide more user feedback in future releases, such as an indexing progress bar.</p>
<div class="section" id="summary">
<h1>Summary</h1>
<p>The new architecture is much faster and more resilient to abnormal files and faulty plugins. It tries to save the basic information first, so that one can easily answer simple queries. The full file information is stored later, when the user is idle.</p>
</div>
</div>
]]></content:encoded>
    </item>
    <item>
      <title>Nepomuk without Strigi</title>
      <link>http://vhanda.in/blog/2012/11/nepomuk-without-strigi</link>
      <pubDate>Wed, 07 Nov 2012 22:36:55 IST</pubDate>
      <category><![CDATA[uncategorized]]></category>
      <guid>http://vhanda.in/blog/2012/11/nepomuk-without-strigi</guid>
      <description>Nepomuk without Strigi</description>
      <content:encoded><![CDATA[<div class="document">
<p>Strigi has always been a large part of Nepomuk. In fact a lot of users still do not understand the difference between the two. It's quite common to see bug reports saying mentioning &quot;Strigi/Nepomuk&quot;. Lots of blog posts do the same.</p>
<p>Strigi consists of a number of different parts. In Nepomuk we just used to use libstreams and libstreamanalyzer. These were pure C++ libraries. The great thing about Strigi is that it is based on streams, instead of files. So one can theoretically even extract metadata from the album image embedded inside an audio files. It's very powerful. Unfortunately, everything comes at a price, and this increased &quot;awesomeness&quot; comes with increased complexity. Additionally with it being a pure C++ ( no Qt or KDE ) library, contributing is harder.</p>
<p>For 4.10, We decided to take a very drastic change and move away from Strigi. There are a large number of <a class="reference external" href="http://mail.kde.org/pipermail/nepomuk/2012-September/003167.html">reasons</a> for doing so. Apart from the technical ones there was also an economic one - A large code base like Strigi is difficult to maintain and comes with a lot of added complexity.</p>
<p>Our own solution is based only on files (not streams) thereby making it a lot simpler. It directly uses the Nepomuk and KDE libraries, thereby making integration very simple. Integrating Strigi in Nepomuk required a lot of code.</p>
<p>This new file indexer currently resides in the nepomuk-core repository and does not have a public interface. I'm currently still debating if it should be public for 4.10. Write about if it gets a public interface, one can theoretically write plugins in other languages.</p>
<p>So far we have 5 indexers -</p>
<ul class="simple">
<li>Image File - Based on Exiv2</li>
<li>Video Files - Based of ffmpeg (We might move to gstreamer)</li>
<li>Audio Files - Taglib</li>
<li>PDF Files - Poppler</li>
<li>Plain Text files</li>
</ul>
<p>Writing file indexers for Nepomuk is now very simple. In fact these 5 indexers combined are just 500 lines. Here is the important part of the plain text extractor -</p>
<pre class="literal-block">
QTextStream ts( &amp;file );
QString contents = ts.readAll();

int characters = contents.length();
int lines = contents.count( QChar('\n') );
int words = contents.count( QRegExp(&quot;\\b\\w+\\b&quot;) );

SimpleResource fileRes( resUri );
fileRes.addType( NFO::PlainTextDocument() );
fileRes.addProperty( NIE::plainTextContent(), contents );
fileRes.addProperty( NFO::wordCount(), words );
fileRes.addProperty( NFO::lineCount(), lines );
fileRes.addProperty( NFO::characterCount(), characters );
</pre>
<p>The current file indexers cover most of the commonly used files, but they still need to be polished. So, if you're interested in contributing to Nepomuk, here is your chance.</p>
<p>I've managed to <a class="reference external" href="http://community.kde.org/Projects/Nepomuk/FileIndexing">catalog</a> some of the different files that I know we support. Our current indexers support many more formats, they just need to be properly tested.</p>
<p>If you're interested in helping, you can start by running nepomuk-core, and manually indexing the different file formats and updating this <a class="reference external" href="http://community.kde.org/Projects/Nepomuk/FileIndexing">page</a>. If you're a developer, feel free to checkout nepomuk-core, and start writing extractors. I've written a simple <a class="reference external" href="http://techbase.kde.org/Projects/Nepomuk/IndexingPlugin">guide</a>.</p>
<p>Btw, all of this Nepomuk awesomeness is powered by Blue Systems!</p>
</div>
]]></content:encoded>
    </item>
    <item>
      <title>Nepomuk and KDE 4.9.1</title>
      <link>http://vhanda.in/blog/2012/09/nepomuk-and-kde-4.9.1</link>
      <pubDate>Mon, 03 Sep 2012 15:36:55 IST</pubDate>
      <category><![CDATA[uncategorized]]></category>
      <guid>http://vhanda.in/blog/2012/09/nepomuk-and-kde-4.9.1</guid>
      <description>Nepomuk and KDE 4.9.1</description>
      <content:encoded><![CDATA[<div class="document">
<p>Last week 4.9.1 was tagged, and it should release any day now. A large part of my time was spent bug fixing and stabilizing some of the most user visible features. This blog post highlights some of the more user visible changes.</p>
<div class="section" id="file-watcher">
<h1>File Watcher</h1>
<p>Apart from minor fixes such as extra checks for <a class="reference external" href="https://projects.kde.org/projects/kde/kdelibs/nepomuk-core/repository/revisions/5609c4cdd8c7d938a9b3e99285b1044eea2fcf04">buffer overflows</a>, and kernel <a class="reference external" href="https://projects.kde.org/projects/kde/kdelibs/nepomuk-core/repository/revisions/55761d35bb9e9ce863797b742c301d947dab61d0">version checks</a>, we have two major improvements:</p>
<div class="section" id="memory-leak">
<h2>Memory Leak</h2>
<p>The Nepomuk File monitoring service had a <a class="reference external" href="https://bugs.kde.org/show_bug.cgi?id=304476">serious memory leak</a> in 4.9.0, which was unfortunately not caught in the beta or RC testing period. Depending on the number of directories present, the file watcher service would <strong>consume a good 2 GB of your RAM</strong>. Fortunately, many distributions patched their own tarballs before releasing to the public.</p>
</div>
<div class="section" id="reindexing-events">
<h2>Reindexing events</h2>
<p>Another bug was that lots of files were being unnecessarily reindexed when they were opened in write mode even though no changes had been made. With 4.9.1 we actually <a class="reference external" href="https://projects.kde.org/projects/kde/kdelibs/nepomuk-core/repository/revisions/48d909c8aa4baca11e7bc1cf4ba5a23c1474fc22">check</a> if the modification time of the file has changed, even when the file notification system tells us otherwise.</p>
<p>This should fix most of the problems of files unnecessarily being reindexed.</p>
</div>
</div>
<div class="section" id="file-indexer">
<h1>File Indexer</h1>
<p>The File Indexer has also gone through a number of fixes -</p>
<div class="section" id="do-not-check-for-new-files-on-startup">
<h2>Do not check for new files on startup</h2>
<p>One of the most annoying things I found about file indexing was the initial check for new files on startup. Even though this check runs silently in the background, it was very annoying. Specially for people who run Nepomuk all the time.</p>
<img alt="/images/scanning.png" class="align-center" src="/images/scanning.png" />
<p><a class="reference external" href="https://projects.kde.org/projects/kde/kdelibs/nepomuk-core/repository/revisions/150a55d5eaabd8ad9a97112214fc2e008e9a1d11">Now</a>, we only check if the strigi version has changed, and only then do we check for new files / check for any files that could not be indexed the last time. If someone wants to forcibly check for new files, they can do so with a dbus call -</p>
<p><tt class="docutils literal">$ qdbus org.kde.nepomuk.services.nepomukfileindexer /nepomukfileindexer updateAllFolders false</tt></p>
<p>Or wait till 4.10, when I add an option to do so in the KCM.</p>
</div>
<div class="section" id="secondary-indexer">
<h2>Secondary Indexer</h2>
<p>This patch technically made it into 4.9.0, but I never got the chance to blog about it. I've introduced a <a class="reference external" href="https://projects.kde.org/projects/kde/kdelibs/nepomuk-core/repository/revisions/414fd4c1c3c358aab70e1e10dd726ea2c1432e1f">SimpleIndexer</a> which serves as backup when the strigi indexers provide incorrect data. This way instead of having no data about the file we at least have the basic information such as filename, url and mimtype. I have plans for a more concrete solution for 4.10, but that's for another blog post.</p>
</div>
<div class="section" id="nasty-deadlock">
<h2>Nasty Deadlock</h2>
<p>Another <a class="reference external" href="https://bugs.kde.org/show_bug.cgi?id=304982">annoying bug</a> that was not caught with 4.9.0 was a deadlock that rendered the file indexer service useless.</p>
<img alt="/images/nepomuk-deadlock.jpg" class="align-center" src="/images/nepomuk-deadlock.jpg" />
<p>This only happened during the first run of Nepomuk, and led to some <a class="reference external" href="http://www.dedoimedo.com/computers/fedora-17-kde.html">unfortunate publicity</a>. In in the end it was a very simple fix once the proper backtrace had been provided.</p>
</div>
<div class="section" id="strigi-analyzers">
<h2>Strigi Analyzers</h2>
<p>Another big change which probably has a large impact on indexing times is the blacklisting of certain file indexing (strigi) plugins. Currently by default, we only <a class="reference external" href="https://bugs.kde.org/show_bug.cgi?id=303670">blacklist the SHA1 hash generator</a>. With it blacklisted, we do not (hopefully) need to read the full contents of the file. This results in a noticeable performance improvement for large files.</p>
</div>
</div>
<div class="section" id="queries">
<h1>Queries</h1>
<div class="section" id="file-queries">
<h2>File Queries</h2>
<img alt="/images/dolphin_places_panel.png" class="align-center" src="/images/dolphin_places_panel.png" />
<p>One of my favorite features of Dolphin is their Places Panel with all of those defaults - Documents, Images, Audio and Video files. Unfortunately, those defaults were normal Nepomuk queries and not file queries. With 4.9.1, they have been changed to file queries which lets them access a number of optimizations on our end, and allows them to benefit from my <a class="reference external" href="https://projects.kde.org/projects/kde/kdelibs/nepomuk-core/repository/revisions/328fbfd8a6fc66bf0b10bda7813b4827e3118d72">minor optimizations</a> in file queries.</p>
</div>
<div class="section" id="query-updates">
<h2>Query Updates</h2>
<p>Most applications which currently use Nepomuk utilize the QueryService in order to run queries. The Query Service can then provide updates on those queries if there is some change in the results - It used to do so by running <strong>ALL open queries when any data in Nepomuk changes</strong>. This was not a good approach, as it obviously does not scale.</p>
<p>With 4.9.1, we are now using <a class="reference external" href="https://projects.kde.org/projects/kde/kdelibs/nepomuk-core/repository/revisions/ead226c9571a15da8d7a92810f5c4afd35bf9de8">some simple heuristics</a> and the Resource Watcher to only re-run queries when affected data changes. So indexing a file will not result in the updating of all custom Nepomuk magic folders.</p>
</div>
</div>
<div class="section" id="nepomuk-core-library">
<h1>Nepomuk Core Library</h1>
<p>And finally, the have been a number of <a class="reference external" href="https://projects.kde.org/projects/kde/kdelibs/nepomuk-core/repository/revisions/e4d8cd1f76192dc798f2db09b9e19310d7c1d65f">bug fixes</a> and <a class="reference external" href="https://projects.kde.org/projects/kde/kdelibs/nepomuk-core/repository/revisions/7bef7c53d3b9a971c203ed4391bf19ac79f381f5">crash fixes</a> in the NepomukCore library, but considering that they aren't really that user visible, I'm not going to talk much about them.</p>
</div>
<div class="section" id="miscellaneous">
<h1>Miscellaneous</h1>
<p>There have been a large number of improvements in the Nepomuk Testing Suite, and more tests have been added. Most of the tests passed, but it still serves as a good way of checking regressions.</p>
<p>Along with that there have been some <a class="reference external" href="https://projects.kde.org/projects/kde/kdelibs/nepomuk-core/repository/revisions/24caa3821aed71e590a3e55a76c6e4bc08f7d9d5">fixes</a> with Removeable Media handling.</p>
</div>
<div class="section" id="soprano">
<h1>Soprano</h1>
<p>Recently I blogged about certain major query optimizations, which would only be public in 4.10. This major optimization was done by skipping a large part of the Soprano code. After the blog post, we identified some of the bottle necks, and optimized them. The most impressive result was a reduction of one query from <strong>6 seconds to 1 second</strong>.</p>
<p>Again, this isn't really a part of KDE 4.9.1, but it will be a part of Soprano 2.8.1. So when it releases, make sure you upgrade.</p>
<p>That's all for this release :)</p>
</div>
</div>
]]></content:encoded>
    </item>
    <item>
      <title>Nepomuk Widgets Repository</title>
      <link>http://vhanda.in/blog/2012/08/nepomuk-widgets-repository</link>
      <pubDate>Tue, 28 Aug 2012 13:36:18 IST</pubDate>
      <category><![CDATA[uncategorized]]></category>
      <guid>http://vhanda.in/blog/2012/08/nepomuk-widgets-repository</guid>
      <description>Nepomuk Widgets Repository</description>
      <content:encoded><![CDATA[<div class="document">
<p>With KDE 4.9, we introduced a new repository called <a class="reference external" href="https://projects.kde.org/projects/kde/kdelibs/nepomuk-core">nepomuk-core</a>. This contained a combination of <a class="reference external" href="https://projects.kde.org/projects/kde/kdelibs/repository/revisions/master/show/nepomuk">kdelibs/nepomuk</a> and <a class="reference external" href="https://projects.kde.org/projects/kde/kde-runtime/repository/revisions/master/show/nepomuk">kde-runtime/nepomuk</a>. It was created because of the API freeze present in kdelibs. Considering that most of the client libraries are thin wrappers over the runtime components, it made sense to combine them in one repository..</p>
<p>In order to be compatibile with kdelibs, the new library is installed with the Nepomuk2 namespace.</p>
<p>Now with KDE 4.10 we are going to have another new repository <a class="reference external" href="https://projects.kde.org/projects/kde/kdelibs/nepomuk-widgets">nepomuk-widgets</a>. This repository contains the remaining GUI parts of <a class="reference external" href="https://projects.kde.org/projects/kde/kdelibs/repository/revisions/master/show/nepomuk">kdelibs/nepomuk</a> that were not moved to <tt class="docutils literal"><span class="pre">nepomuk-core</span></tt>.</p>
<div class="section" id="port-your-applications">
<h1>Port your Applications</h1>
<p>With this repo, we have all of the earlier functionality covered. <strong>With 4.10, the Nepomuk libraries will be deprecated</strong>. So, port your applications to <tt class="docutils literal">Nepomuk2</tt>. I've updated <a class="reference external" href="http://techbase.kde.org/Projects/Nepomuk/Nepomuk2Port">the wiki</a> with a short script that should do take care of most of the changes. You will have to update your CMake files on your own.</p>
</div>
<div class="section" id="advantages">
<h1>Advantages</h1>
<p>The <a class="reference external" href="https://projects.kde.org/projects/kde/kdelibs/repository/revisions/master/show/nepomuk">kdelibs/nepomuk</a> libraries are in a critical bug-fix state only. That being said, some of the most important classes over there do not have any kind of tests. With <tt class="docutils literal">Nepomuk2</tt>, we have decent test coverage, and active development. Plus, you get access to a number of new asynchronous APIs.</p>
</div>
</div>
]]></content:encoded>
    </item>
    <item>
      <title>Nepomuk Technical Documentation</title>
      <link>http://vhanda.in/blog/2012/08/nepomuk-technical-documentation</link>
      <pubDate>Fri, 24 Aug 2012 16:54:29 IST</pubDate>
      <category><![CDATA[uncategorized]]></category>
      <guid>http://vhanda.in/blog/2012/08/nepomuk-technical-documentation</guid>
      <description>Nepomuk Technical Documentation</description>
      <content:encoded><![CDATA[<div class="document">
<p>I have been working a lot to improve Nepomuk's technical documentation on the <a class="reference external" href="http://techbase.kde.org/Projects/Nepomuk">techbase</a>. And it now finally reached the point where I think it covers most of the major aspects of Nepomuk. If you're using Nepomuk in your applications, you should read it.</p>
<img alt="/images/nepomuk.png" class="align-center" src="/images/nepomuk.png" />
<p>It's a little hard for me to objectively state if I've explained stuff properly considering that I have been so involved in the development. If you feel some section is lacking or you're having trouble understanding it, please contact me. I'll be happy to update it.</p>
<p>And with this, I can (somewhat) tick off one item from the <a class="reference external" href="http://community.kde.org/Projects/Nepomuk/Akademy_2012_BOF">Nepomuk BOF todo list</a>.</p>
</div>
]]></content:encoded>
    </item>
    <item>
      <title>Faster Nepomuk Queries</title>
      <link>http://vhanda.in/blog/2012/08/faster-nepomuk-queries</link>
      <pubDate>Tue, 21 Aug 2012 22:03:59 IST</pubDate>
      <category><![CDATA[uncategorized]]></category>
      <guid>http://vhanda.in/blog/2012/08/faster-nepomuk-queries</guid>
      <description>Faster Nepomuk Queries</description>
      <content:encoded><![CDATA[<div class="document">
<p>Nepomuk has a very decentralized architecture where the different components exist as different processes. They are all variants of the same executable - <tt class="docutils literal">nepomukservicestub</tt>. This servicestub loads appropriate service plugin. The main reason for doing this was stability. If one of the components crashes, then it doesn't take all the other components with it.</p>
<p>Unfortunately this architecture doesn't hold very well when the different components need to communicate with one another. In that case they need to use complex methods such as dbus or local sockets. Another problem is the increased memory consumption cause each process has its own internal cache (Nepomuk stuff) and other KDE specific stuff.</p>
<img alt="/images/query-storage-separate.png" class="align-center" src="/images/query-storage-separate.png" style="width: 450px;" />
<p>If you <a class="reference external" href="http://www.vhanda.in/blog/2012/08/nepomuk-without-files/">ignore file handling</a> in Nepomuk, we have two main services -</p>
<ul class="simple">
<li>Storage Service</li>
<li>Query Service.</li>
</ul>
<p>The Storage Service is responsible for managing the ontologies, initializing virtuoso, and other data management functions. The QueryService exists for caching queries and running them in a separate thread.</p>
<p>Now the Query Service obviously need to access the virtuoso database, and for that it needs to go through the storage service. This communication happens through a local socket. The same socket which all other applications use to access Nepomuk.</p>
<p>Last week, I finally merged the query service into the storage service.</p>
<img alt="/images/query-storage-merged.png" class="align-center" src="/images/query-storage-merged.png" style="width: 450px;" />
<p>I was aiming for a small memory decrease, and a slight performance upgrade on the queries. Boy, was I wrong! The additional local socket seems to have been a huge bottleneck.</p>
<p>Here are some benchmarks listing about 12,500 resources.</p>
<img alt="/images/queryservice-benchmarks.png" class="align-center" src="/images/queryservice-benchmarks.png" style="width: 700px;" />
<p>There are still many more performance upgrades that can be done, but this seemed like a good place to start :)</p>
</div>
]]></content:encoded>
    </item>
    <item>
      <title>Nepomuk Without Files</title>
      <link>http://vhanda.in/blog/2012/08/nepomuk-without-files</link>
      <pubDate>Mon, 06 Aug 2012 18:04:21 IST</pubDate>
      <category><![CDATA[uncategorized]]></category>
      <guid>http://vhanda.in/blog/2012/08/nepomuk-without-files</guid>
      <description>Nepomuk Without Files</description>
      <content:encoded><![CDATA[<div class="document">
<p>Most people assume that if they switch off file indexing in Nepomuk, then all the nepomuk file services will get disabled. This is however not the case. Nepomuk consists of two services which are used to deal with files -</p>
<ul class="simple">
<li>Nepomuk File Watcher</li>
<li>Nepomuk File Indexer</li>
</ul>
<p>The Nepomuk File Indexer is responsible for calling the <a class="reference external" href="https://projects.kde.org/projects/kdesupport/strigi/libstreamanalyzer">strigi plugins</a> to index the files, whereas the FileWatch service is a general service that monitors file move, creation and deletion events. Even when the <em>File Indexer</em> does not exist, files may have metadata attached to them - Tags, Rating and Comments. We need the File Watcher to update our database whenever the url of a file changes.</p>
<p>The File Watcher internally uses a kernel API for file monitoring - inotify. This API, while quite easy to use, does not allow us to recursively watch directories, and <em>more importantly</em>, does not provide file move events unless we are watching both the source and the destination directory.</p>
<p>We need file move events in order to track a file's url. This results in us having to create inotify watches for every single directory in your $HOME folder. This causes a large disk load on startup and is the cause of one of the <a class="reference external" href="https://bugs.kde.org/show_bug.cgi?id=233471">critical bugs</a> in Nepomuk. And we have no solution, until the kernel provides us with a better API.</p>
<p>Anyway, Nepomuk is being used in KDE PIM and Telepathy (development version), and none of those use cases have anything to do with files. It doesn't make sense to subjugate others to pay the price of the file watcher, when they are not doing anything related to files. So, with that in mind, please add the following lines to your nepomukserverrc, if you do not care about files at all -</p>
<pre class="literal-block">
[Service-nepomukfilewatch]
autostart=false
</pre>
<div class="section" id="warning">
<h1>Warning</h1>
<p>With the Nepomuk FileWatch Service disabled, you'll still be able to tag and rate your files, but these annotations will be lost if you move or rename the file.</p>
</div>
</div>
]]></content:encoded>
    </item>
  </channel>
</rss>
