As you might have read, the 4.11 release of Nepomuk is a lot faster while doing writes and therefore indexing is going to much faster. However, read performance was nearly the same.
Architecture
All application communicate to Nepomuk via the Storage Service which acts like a server. It is responsible for loading the ontologies, managing virtuoso and sending change notifications. The read communications happen over a local socket whereas the writes are sent over dbus.
The Storage service communicates with virtuoso via odbc, which interally also uses a local socket. This architecture is quite similar to that of Akonadi.
Cutting the middle man
I initially startedwithmanyoptimizations in Soprano where we use ODBC to communicate with virtuoso. These gave a good 30% increase in performance. However, the largest increase in performance was by removing the local socket communication between applications and the storage service.
All applications now directly communicate via virtuoso for reading data. The writes still go through the storage service so that we can do type checking and so that we can send the change notifications.
Performance Increase
Removing the storage service from the middle results in a performance increase of 6-7x. For example - listing 50000 results goes down from ~20 seconds to about 2.5 seconds.
Aditionally, since now the applications now directly communicate with the database, the CPU load of the Storage service also goes down a lot. It no longer has to serialize and deserialize data.
With the 4.10 release of Nepomuk, we decided to move away from Strigi and write our own indexers. We support most of the commonly used formats. Also, the new code is faster and more importantly more maintainable and easier to contribute to. So far this decision has worked pretty well for us.
That being said - we still do not have enough indexers. Over the last week I managed to write simple ODF and Office2007 indexers, but we still need some more.
From the most commonly used file formats, we are still missing indexers for -
Ebook file formats
Office 97-2003 file formats (.doc)
If you have some time and would like to help out, writing an indexer is a perfect way to get involved. Writing a plugin involves finding the appropriate library to parse the file format and extracting the plain text and basic metadata such as author, title, etc.
The MS Office 97-2003 formats are slightly harder because of the lack of a good high level library to parse the binary formats. However, we do have the wv library which is a little low level, but can still be used.
It would be great to have these indexers for 4.11.
A couple of days ago I talked about how we have been clearing up some unwanted data in Nepomuk for the 4.11 release - mainly graphs. This change comes with a increased performance of over 100% in many cases, and makes the codebase simpler, and easier to maintain.
Unfortunately, it comes at a cost.
The graphs in the old database need to be merged to a small number. This operation is a very time consuming process cause merging graphs is equivalent to slowly removing your entire database and reinserting it.
Given that all users will have to go through this migration. We decided to add some additional methods of migrating.
Backup Tags and Rating - The user can choose to only backup their file tags and ratings, remove the entire database, and then restore the ratings. The graph merging process is only performed on the tags and ratings when creating the backup.
This process is very fast and is the recommended way.
Once this has been performed, all your files and emails will need to be indexed again, which is actually a good thing cause historically a lot of the indexed data has been quite inferior.
Migrate the existing Data - We can obviously go through the slow process of migrating all of the graphs. This can however easily take a couple of hours for medium sized databases (2.5 gb). I would not recommend this unless you have some really important data that you added on your own that option (1) does not cover.
Start Afresh - Just remove the existing database and start with a fresh Nepomuk installation.
So, far the user is given a choice the first time Nepomuk runs in 4.11. It’s a little ugly, but that can be fixed.
I’m hoping that this will not be too much of a pain and that the users can just click through the wizard using the default option (1). For medium sized databases this entire process gets done in just a couple of minutes.
On the positive side, because of this migration I had a chance to fix and test Nepomuk Backup - Just backing up the Tags and Ratings works very well right now. Backing up the full Nepomuk system still needs to be tested.
I recommend developers to checkout the feature/mergeGraphs branch and try out the migration and just use that branch of Nepomuk for a while. When I’m confident about the migration and the internal changes, I’ll merge it into master.
Since I’ve become the maintainer of Nepomuk we have put a strong emphasis on performance and stability. One of the core parts of Nepomuk are the high level operations that are exposed to the applications. These operations are typically used to insert and modify data into Nepomuk. Each of these operations is quite complex and involves a number of complicated queries.
For this 4.11 release we wanted to simplify that code and make it more efficient. This blog post delves in the technical details of what has changed, and then finally goes into how that affects the users.
Graphs
It is often said that Nepomuk operates on triples in the format -
<subject> <predicate> <object>
While this is true, it skips over the fourth parameter which is the context or graph. Statements in Nepomuk are actually of the form -
<subject> <predicate> <object> <graph>
This fourth parameter allows us to store some additional information about the statement such as when it was added and who added it. The Nepomuk project has had a history of saving a lot of data without a clear usecase in mind. Graphs are a prime example of that.
Each statement does not have its own graph, rather a group of statements are clubbed together in one graph.
Before the 4.8 release, a new graph was created every 200 msecs. This graph just contained the creation date of the graph. This involved the insertion of 4 statements -
<graph> a nrl:InstanceBase .
<graph> nao:created “dateTime” .
<metaDataGraph> a nrl:GraphMetadata .
<metaDataGraph> nrl:coreGraphMetadataFor <graph> .
These 4 statements have never seen any use.
After the 4.8 release we introduced a set of central asynchronous APIs which performed a lot of the higher level functions so that application would not have to deal directly with statements.
This new API created a new graph each time a call was made to any of these higher level functions. The graphs that were now created were slightly more useful. They contained the following information
creation date
creating agent
The “Agent” in this case is the application that send the command for the data to be added. This extra “Agent” field allowed us some nice operations such as removeDataByApplication which allows applications to only remove the data they have added.
This functionality was already present for the file indexer in the pre-4.8 days. It was then generalized and made applicable to all applications.
The bottle neck is this plan is the creation of a new graph for each command. Even adding a simple property would result in a good 5-10 insert calls. This effectively kills our performance. For the 4.10 release I managed to combine a number of these insert calls and optimize the code, but we were still doing a large number of writes which served no purpose. We have never used the creation date of a graph in any way.
Additionally, we had to perform complex queries to make sure the data is always present in one graph, and check for empty graphs so that they could be removed. Overall, it was quite messy.
New Graph Handling
With this 4.11 release I have simplified the concept of graphs. Now there are a limited number of graphs based on the number of Agents that push data into Nepomuk. Each Agent gets its own graph. This way we can still easily implement ‘removeDataByApplication’, and decrease the complexity of our code base.
This grossly simplifies the internals of the Nepomuk code base since no longer need to worry about all the complicated graph handling.
This big change is still in a feature/mergeGraphs branch. I’m still not completely ready to merge it into master.
Benchmarks
These are initial benchmarks that were taken about a month ago. There is still scope for more optimizations. Especially if we combine more of our SPARQL calls. Also these benchmarks were run on a blank database. The difference should be a lot larger when there is some real world data.
Function
4.9
4.10
4.11
addProperty()
130
79
32
addProperty_sameData()
60
34
32
setProperty()
150
107
40
setProperty_sameData()
145
94
41
storeResources()
85
52
21
storeResources_email()
300
85
77
createResource()
26
14
8
removeResources()
49
30
16
removeDataByApplication()
82
182
54
removeDataByApplication_subResources
89
171
58
removeAllDataByApplication()
84
375
55
The numbers are in msecs. The functions are the higher level functions that all applications use to push any data into Nepomuk.
Whenever you add a tag in Nepomuk, the addProperty/setProperty methods are called. The file indexers and PIM feeder mostly use the storeResources function to push new data and removeDataByApplication to remove existing indexing data.
If you look at the results you’ll notice a substantial increase with 4.10 except for the removeDataByApplication functions where they seem to have taken a severe hit. I’m not too sure why this has happened, the only change in that code base has been one bug fix which should have increased performance.
As noted above the number of graphs in Nepomuk are now limited and we no longer create superfluous graphs. However, all those extra graphs are still present and need to merged into a finite number. This can be a time consuming process.
Tomorrow, I’ll go into the details of how we plan to counter that.
When something goes wrong in Nepomuk, its easy for us Nepomuk developers to track it down, but for other developers and users it can be quite hard. Even simple things like reporting which component is malfunctioning isn’t completely obvious.
Over the last month, we have simplified some of the external details and added tools which will help us debug your problems so that we can fix things more easily. These all will be shipped with nepomuk-core in 4.11
Nepomuk2::Service2
Nepomuk, like most modular architectures, has a number of different plugins or as we like to call them “services”. Traditionally each service would be installed as a library that would be loaded by the nepomukservicestub process. When most users would try to provide debugging information they mostly just provide the process name - nepomukservicestub. This doesn’t tell us much, since all the heavy lifting is done by the nepomuk services. The client libraries are mostly just light wrappers.
With the 4.10 release we have 3 major services -
Storage Service
File Watch Service
File Indexing Service
Each can be started by calling the nepomukservicestub along with the service name. Eg - nepomukservicestub nepomukfilewatch.
Currently in master, we have moved away from this approach and each service now installs its own process. So you should no longer see any nepomukservicestubs. Instead you’ll see a nepomukstorage, nepomukfilewatch and nepomukfileindexer process.
This greatly simplifies the debugging process as the users can easily report which process is problematic, and starting a service is just a matter of running the correct executable.
Tools
Restarting Nepomuk and looking into the database has traditionally required some dbus commands. These commands were always apparent to us developers, but it’s good to have some standard ways of managing Nepomuk.
NepomukCtl
Thanks to Gabriel, we now have nepomukctl which acts very similar to the akonadictl. It can be easily used to start, stop and restart Nepomuk or any individual service.
This is a great tool to check if Nepomuk is actually running.
Nepomuk Show
In order to view the data inside the Nepomuk database one typically needs to issue a query. This requires the developers to know SPARQL. Typically users do not want to go into so much effort when they are debugging simple stuff.
Now nepomukshow can be used to easily view the resource information.
This is a great tool to use to see what all information has been indexed about a file or if a file has been indexed at all. It can even be used to check if an email has been indexed, though the syntax is a little different.
$ nepomukshow 'akonadi:?item=39618'
<nepomuk:/res/b8ef2a3f-9112-4dfb-9071-4b1ce7544b1b>
rdf:type aneo:AkonadiDataObject
rdf:type nmo:Email
nao:created 2012-12-11T10:29:20Z
nao:hasSymbol internet-mail
nie:isPartOf <akonadi:?collection=44>
nie:byteSize 3998
nie:url <akonadi:?item=39618>
nmo:isRead 1
nmo:to nepomuk:/res/cbee003e-7a36-4dd3-8978-c71c7c91d359
aneo:akonadiItemId 39618
nmo:messageSubject Re: proposals marked as to be accepted in Melange now
nmo:sentDate 2011-04-18T16:45:59Z
nmo:from nepomuk:/res/b900fd2e-dacf-4395-bb72-6c9ed78f5b71
nmo:messageId <BANLkTikYWvzvGCM4-N0cOw0NdVdO8BmOpg@mail.gmail.com>
NepomukCmd
Most developers already know of nepomukcmd which was an alias on top of sopranocmd. It could be used to query Nepomuk and contained many more soprano specific features which are now no longer applicable.
We are now shipping our own nepomukcmd tool, which currently only supports sparql queries. It does however support a neat --inference option which can be used to selectively enable and disable inferencing.
These tools right now are in a very early, simple but working state and could use some polishing, and extra features in the future. Contributing to these would be a great way to get involved in Nepomuk. Message me for more details.
I've blogged about some of the more prominent changes in this new Nepomuk release. I thought it would be a good idea to document all the changes, which Nepomuk has gone through thanks to Blue Systems!
File Indexing
As the release announcement has been saying, the file indexer has undergone the maximum number of changes.
New Double Queue Architecture
We've split the working of the indexer into two parts - The first basic indexing and second full file indexing. The basic indexing quickly indexes the basic information about the file such as the filename and mimetype. This allows us to always at least answer simple queries. The other queue, which is only run when the user is idle, extracts the full information about the file.
New File Indexer
We've had some problems with Strigi earlier. With 4.10, we have finally decided to release our own solution. Our solution is arguably technologically inferior, but it's more maintainable and, for now, provides a better user experience.
Mimetype Filtering
One of the advantages of moving to this new file indexing architecture is that mimetypes are a very important part. All of the file indexing plugins use mimetypes to identify which types of files they can index. With this, we decided to allow the user to control the type of files that are indexed.
By default, source code is now no longer indexed. Common stuff like Documents, Images, Audio and Videos are.
KioSlave changes
Till the 4.9 release, the kioslave code hadn't changed much. With 4.9.1, we managed to optimize some of the code. The 4.10 release however takes this to an entirely different level.
Massive Optimizations
The 'nepomuksearch' tagging slave could initially show both non-file and file data. This means that it would also occasionally show contacts, albums and other details. Selecting any of those would result in another search for resources related to that contact. For this release, we decided to optimize for the most common use case of listing files.
The 'nepomuksearch' kioslave, and all other nepomuk kioslaves, now no longer show any result which does not have a URL. This coupled with a LOT of other optimizations, has now yielded a super fast kioslave which can display thousands of results in under a second.
As previously stated, we are also introding a new tagging kioslave. This slave allows you to easily manage you Nepomuk tags, and browse files based on the different tags it contains.
File MetadataWidget
One of the largest part of the Dolphin Information Panel was the KFileMetadataWidget which was provided by kdelibs/kio. This widget was one of the last parts of Dolphin that still used Nepomuk1. Since kdelibs was frozen, we couldn't port it to Nepomuk2. Thus emerged the Nepomuk2::FileMetadataWidget in nepomuk-widgets.
The KFileMetadataWidget historically fetched all the data in another process. This was done because Strigi was a little unreliable. With KDE Workspaces 4.10, we are no longer using Strigi in Nepomuk. This means the widget now uses the nepomukindexer, to extract the data. It also no longer uses this multi-process architecture when loading the Nepomuk data. This result in a massive performance improvement cause we can rely on Nepomuk cache in Dolphin, instead of recreating it each time.
In terms of appearance, the widget has become a little more uniform, and by default only shows the properties that really matter.
Improved Removable Media Handling
Nepomuk has for quite some time supported indexing of removable media handling. However, it didn't always work that great. From a design point of view, the solution was great and extremely robust. This however, came at a steep cost for the rest of Nepomuk. Every other query was affected by these features, and not in a small way. For some simple tests of basic indexing, it made of difference of around 20%.
With this new release, we have gone to a simpler solution which has a lighter performance cost. We have also removed the "Automatic Invalid File Metadata Cleaner" which removed the metadata for any file it could not access. The client code now always checks if the file can be accessed before displaying it to the user.
Nepomuk Backup Changes
With KDE Workspaces 4.6, my Google Summer of Code Project, Nepomuk Backup, was finally merged. It was a very ambitious project which attempted to synchronize, backup and restore data in a non-destructible manner. In the end, it was just a little bit too complex. Large parts of the synchronization code, eventually migrated into the data feeding code which is now used by anyone pushing data into Nepomuk. So, it wasn't a complete loss.
With this new release, I finally got around to throwing away most of the complex code, and implementing a very simple and reliable backup solution. This new method does not require a separate service to be running, and therefore consumes less memory. Additionally, we also have some basic unit tests to ensure that the backups are restored properly!
Please keep in mind that this only backups up the non-destructible data. This does not include the file or email index information. If you want that to be backed up, you're better off just making a copy of the database file.
Nepomuk Cleaner
The Nepomuk Cleaner originated from a series of scripts I was writing to clear up my own database. It eventually occurred to me that other people might suffer from the same problem. The scripts were eventually combined into a cohesive form, and released. The application is very simple right now, but that will change in future releases. I even contemplated not releasing it for 4.10, but it clearly provides some value, even if it doesn't look that great.
Other Changes
Surprisingly, I didn't want to include many new features this releases. I was trying to focus more on stabilization. Over the last 6 months, A total of 246 bugs have been resolved, out of which 188 were reported within the last 6 months. This seems like a good improvement to me.
Apart from these simple changes there have been a number of optimizations all across Nepomuk and Soprano. Nepomuk should be running faster and better than ever before. In some cases we have even seen an over 200% increase in performance.
Nepomuk has long required a convenient way of managing tags. I've previously tried this with a simple Tag Managing Application, but that wasn't something that we wanted to ship. For the KDE Workspace 4.10 release, we are releasing a Nepomuk Tags kioslave.
Listing Tags
The kioslave provides a very convenient way of listing all the tags. You can even rename and delete tags, just like you would for any other folder.
Browsing Files
Nepomuk has always provided users a way to browse tags, but it was only one tag at a time. This seemed fairly limiting. Once could browse by more tags, but then you would have had to write the query yourself.
With this kioslave, you can finally browse the files based on the tags, and then filter the search even more by selecting more tags.
Applying Tags
The kioslave also supports adding of tags in bulk. Just drag and drop (or copy) the files into the tagged folder, and the appropriate tags will be applied.
Nepomuk has a unique problem of maintaining an RDF store. Unlike traditional SQL based stores, RDF offers a very loose schema, which is a HUGE advantage. Unfortunately all of the current RDF stores do not support any form of schema enforcement. It's up to the client code to make sure that the data being pushed is valid.
This has resulted in a number of problems such as strings being stored where an integer should go.
With the KDE Workspace 4.7 release, we started employing our own form of schema enforcement in the Nepomuk Storage Service, but the old incorrect data still remains. Also, as Nepomuk has evolved as a project, we have found better ways to store data. Since the schemas are so loose, we could easily store both the old and the new data without any problems on the database level. This obviously results in more complex client code which has to handle both legacy and new data.
For this release, we decided to clean up the code to a certain extent and stop supporting some of the legacy data. We also decided to ship a very basic application called the "Nepomuk Cleaner".
This application is responsible to port any legacy data, clear up incorrect data, and merge duplicate data. We recommend that all users run it at least once. It will result in a performance upgrade of all areas of Nepomuk, including a significant impact in the indexing speed of emails.
With the 4.11 release, we're planning to improve the interface, add more cleaning jobs, and make running this application mandatory. That way we can safely remove all the legacy code paths.
Over the last 6 months, while working for Blue Systems, Nepomuk has undergone a number of changes. The most public and noticeable change has been the major refactoring of the file indexer. One large part of this has been the migration from Strigi. The other large part is the introduction of 2 phase indexing.
With 4.9 and earlier, the file indexing service used to just have one queue, whose speed could be controlled. This queue was filled on startup by comparing the mtime of the file with the one stored in the database. This would involve scanning through all the indexed folders. Once the scan was complete it would also listen to the file watcher to be notified when a file is modified or created.
This architecture had some shortcomings -
Indexing each file is a time consuming process, and it involves extracting and pushing large amounts of data in Nepomuk.
Since this process was slow and we did not want to annoy the users, artificial delays were introduced which were changed based on if the user is idle.
The entire indexing process was suspended when on battery
Faulty files which cannot be indexed do not have any information stored, and could not even be searched by filename
With this new release, we have split the indexing into 2 parts - Basic Indexing and File Indexing. The basic indexing just extracts the stat information and mimetype of the file. Whereas, the file indexing actually extracts data from the file.
This basic indexing is always enabled, and is very fast. It can process around 10-20 files per second. Also, it consumes very little cpu. Extracting this basic information first allows us to search on the basis of type, file and enabled the timeline kioslave to work properly.
The file indexing is the relatively heavy process that is only run when the user is idle, by default.
This two phase architecture allows us to still index all the files, while providing a relatively light burden to the user. It also allows us to provide finer control than a simple on/off switch. For example - Now when on battery, file indexing is disabled, but the simple indexing still continues.
This new approach will also allow us to provide more user feedback in future releases, such as an indexing progress bar.
Summary
The new architecture is much faster and more resilient to abnormal files and faulty plugins. It tries to save the basic information first, so that one can easily answer simple queries. The full file information is stored later, when the user is idle.
Strigi has always been a large part of Nepomuk. In fact a lot of users still do not understand the difference between the two. It's quite common to see bug reports saying mentioning "Strigi/Nepomuk". Lots of blog posts do the same.
Strigi consists of a number of different parts. In Nepomuk we just used to use libstreams and libstreamanalyzer. These were pure C++ libraries. The great thing about Strigi is that it is based on streams, instead of files. So one can theoretically even extract metadata from the album image embedded inside an audio files. It's very powerful. Unfortunately, everything comes at a price, and this increased "awesomeness" comes with increased complexity. Additionally with it being a pure C++ ( no Qt or KDE ) library, contributing is harder.
For 4.10, We decided to take a very drastic change and move away from Strigi. There are a large number of reasons for doing so. Apart from the technical ones there was also an economic one - A large code base like Strigi is difficult to maintain and comes with a lot of added complexity.
Our own solution is based only on files (not streams) thereby making it a lot simpler. It directly uses the Nepomuk and KDE libraries, thereby making integration very simple. Integrating Strigi in Nepomuk required a lot of code.
This new file indexer currently resides in the nepomuk-core repository and does not have a public interface. I'm currently still debating if it should be public for 4.10. Write about if it gets a public interface, one can theoretically write plugins in other languages.
So far we have 5 indexers -
Image File - Based on Exiv2
Video Files - Based of ffmpeg (We might move to gstreamer)
Audio Files - Taglib
PDF Files - Poppler
Plain Text files
Writing file indexers for Nepomuk is now very simple. In fact these 5 indexers combined are just 500 lines. Here is the important part of the plain text extractor -
QTextStream ts( &file );
QString contents = ts.readAll();
int characters = contents.length();
int lines = contents.count( QChar('\n') );
int words = contents.count( QRegExp("\\b\\w+\\b") );
SimpleResource fileRes( resUri );
fileRes.addType( NFO::PlainTextDocument() );
fileRes.addProperty( NIE::plainTextContent(), contents );
fileRes.addProperty( NFO::wordCount(), words );
fileRes.addProperty( NFO::lineCount(), lines );
fileRes.addProperty( NFO::characterCount(), characters );
The current file indexers cover most of the commonly used files, but they still need to be polished. So, if you're interested in contributing to Nepomuk, here is your chance.
I've managed to catalog some of the different files that I know we support. Our current indexers support many more formats, they just need to be properly tested.
If you're interested in helping, you can start by running nepomuk-core, and manually indexing the different file formats and updating this page. If you're a developer, feel free to checkout nepomuk-core, and start writing extractors. I've written a simple guide.
Btw, all of this Nepomuk awesomeness is powered by Blue Systems!
Last week 4.9.1 was tagged, and it should release any day now. A large part of my time was spent bug fixing and stabilizing some of the most user visible features. This blog post highlights some of the more user visible changes.
File Watcher
Apart from minor fixes such as extra checks for buffer overflows, and kernel version checks, we have two major improvements:
Memory Leak
The Nepomuk File monitoring service had a serious memory leak in 4.9.0, which was unfortunately not caught in the beta or RC testing period. Depending on the number of directories present, the file watcher service would consume a good 2 GB of your RAM. Fortunately, many distributions patched their own tarballs before releasing to the public.
Reindexing events
Another bug was that lots of files were being unnecessarily reindexed when they were opened in write mode even though no changes had been made. With 4.9.1 we actually check if the modification time of the file has changed, even when the file notification system tells us otherwise.
This should fix most of the problems of files unnecessarily being reindexed.
File Indexer
The File Indexer has also gone through a number of fixes -
Do not check for new files on startup
One of the most annoying things I found about file indexing was the initial check for new files on startup. Even though this check runs silently in the background, it was very annoying. Specially for people who run Nepomuk all the time.
Now, we only check if the strigi version has changed, and only then do we check for new files / check for any files that could not be indexed the last time. If someone wants to forcibly check for new files, they can do so with a dbus call -
Or wait till 4.10, when I add an option to do so in the KCM.
Secondary Indexer
This patch technically made it into 4.9.0, but I never got the chance to blog about it. I've introduced a SimpleIndexer which serves as backup when the strigi indexers provide incorrect data. This way instead of having no data about the file we at least have the basic information such as filename, url and mimtype. I have plans for a more concrete solution for 4.10, but that's for another blog post.
Nasty Deadlock
Another annoying bug that was not caught with 4.9.0 was a deadlock that rendered the file indexer service useless.
This only happened during the first run of Nepomuk, and led to some unfortunate publicity. In in the end it was a very simple fix once the proper backtrace had been provided.
Strigi Analyzers
Another big change which probably has a large impact on indexing times is the blacklisting of certain file indexing (strigi) plugins. Currently by default, we only blacklist the SHA1 hash generator. With it blacklisted, we do not (hopefully) need to read the full contents of the file. This results in a noticeable performance improvement for large files.
Queries
File Queries
One of my favorite features of Dolphin is their Places Panel with all of those defaults - Documents, Images, Audio and Video files. Unfortunately, those defaults were normal Nepomuk queries and not file queries. With 4.9.1, they have been changed to file queries which lets them access a number of optimizations on our end, and allows them to benefit from my minor optimizations in file queries.
Query Updates
Most applications which currently use Nepomuk utilize the QueryService in order to run queries. The Query Service can then provide updates on those queries if there is some change in the results - It used to do so by running ALL open queries when any data in Nepomuk changes. This was not a good approach, as it obviously does not scale.
With 4.9.1, we are now using some simple heuristics and the Resource Watcher to only re-run queries when affected data changes. So indexing a file will not result in the updating of all custom Nepomuk magic folders.
Nepomuk Core Library
And finally, the have been a number of bug fixes and crash fixes in the NepomukCore library, but considering that they aren't really that user visible, I'm not going to talk much about them.
Miscellaneous
There have been a large number of improvements in the Nepomuk Testing Suite, and more tests have been added. Most of the tests passed, but it still serves as a good way of checking regressions.
Along with that there have been some fixes with Removeable Media handling.
Soprano
Recently I blogged about certain major query optimizations, which would only be public in 4.10. This major optimization was done by skipping a large part of the Soprano code. After the blog post, we identified some of the bottle necks, and optimized them. The most impressive result was a reduction of one query from 6 seconds to 1 second.
Again, this isn't really a part of KDE 4.9.1, but it will be a part of Soprano 2.8.1. So when it releases, make sure you upgrade.
With KDE 4.9, we introduced a new repository called nepomuk-core. This contained a combination of kdelibs/nepomuk and kde-runtime/nepomuk. It was created because of the API freeze present in kdelibs. Considering that most of the client libraries are thin wrappers over the runtime components, it made sense to combine them in one repository..
In order to be compatibile with kdelibs, the new library is installed with the Nepomuk2 namespace.
Now with KDE 4.10 we are going to have another new repository nepomuk-widgets. This repository contains the remaining GUI parts of kdelibs/nepomuk that were not moved to nepomuk-core.
Port your Applications
With this repo, we have all of the earlier functionality covered. With 4.10, the Nepomuk libraries will be deprecated. So, port your applications to Nepomuk2. I've updated the wiki with a short script that should do take care of most of the changes. You will have to update your CMake files on your own.
Advantages
The kdelibs/nepomuk libraries are in a critical bug-fix state only. That being said, some of the most important classes over there do not have any kind of tests. With Nepomuk2, we have decent test coverage, and active development. Plus, you get access to a number of new asynchronous APIs.
I have been working a lot to improve Nepomuk's technical documentation on the techbase. And it now finally reached the point where I think it covers most of the major aspects of Nepomuk. If you're using Nepomuk in your applications, you should read it.
It's a little hard for me to objectively state if I've explained stuff properly considering that I have been so involved in the development. If you feel some section is lacking or you're having trouble understanding it, please contact me. I'll be happy to update it.
Nepomuk has a very decentralized architecture where the different components exist as different processes. They are all variants of the same executable - nepomukservicestub. This servicestub loads appropriate service plugin. The main reason for doing this was stability. If one of the components crashes, then it doesn't take all the other components with it.
Unfortunately this architecture doesn't hold very well when the different components need to communicate with one another. In that case they need to use complex methods such as dbus or local sockets. Another problem is the increased memory consumption cause each process has its own internal cache (Nepomuk stuff) and other KDE specific stuff.
The Storage Service is responsible for managing the ontologies, initializing virtuoso, and other data management functions. The QueryService exists for caching queries and running them in a separate thread.
Now the Query Service obviously need to access the virtuoso database, and for that it needs to go through the storage service. This communication happens through a local socket. The same socket which all other applications use to access Nepomuk.
Last week, I finally merged the query service into the storage service.
I was aiming for a small memory decrease, and a slight performance upgrade on the queries. Boy, was I wrong! The additional local socket seems to have been a huge bottleneck.
Here are some benchmarks listing about 12,500 resources.
There are still many more performance upgrades that can be done, but this seemed like a good place to start :)
Most people assume that if they switch off file indexing in Nepomuk, then all the nepomuk file services will get disabled. This is however not the case. Nepomuk consists of two services which are used to deal with files -
Nepomuk File Watcher
Nepomuk File Indexer
The Nepomuk File Indexer is responsible for calling the strigi plugins to index the files, whereas the FileWatch service is a general service that monitors file move, creation and deletion events. Even when the File Indexer does not exist, files may have metadata attached to them - Tags, Rating and Comments. We need the File Watcher to update our database whenever the url of a file changes.
The File Watcher internally uses a kernel API for file monitoring - inotify. This API, while quite easy to use, does not allow us to recursively watch directories, and more importantly, does not provide file move events unless we are watching both the source and the destination directory.
We need file move events in order to track a file's url. This results in us having to create inotify watches for every single directory in your $HOME folder. This causes a large disk load on startup and is the cause of one of the critical bugs in Nepomuk. And we have no solution, until the kernel provides us with a better API.
Anyway, Nepomuk is being used in KDE PIM and Telepathy (development version), and none of those use cases have anything to do with files. It doesn't make sense to subjugate others to pay the price of the file watcher, when they are not doing anything related to files. So, with that in mind, please add the following lines to your nepomukserverrc, if you do not care about files at all -
[Service-nepomukfilewatch]
autostart=false
Warning
With the Nepomuk FileWatch Service disabled, you'll still be able to tag and rate your files, but these annotations will be lost if you move or rename the file.
A little while ago, before Akademy, I started implementing the Shared Hash Memory table, so as to improve Nepomuk's architecture. Architectural designs really interest me. I decided to be smart about it, if I was going to refactor some of the main classes, there was a very high chance I would break something. The code is quite complex.
I needed a way to make sure I was on the right track.
The obvious conclusion was that we need comprehensive unit tests for Nepomuk. Unfortunately Nepomuk architecture is such that unit testing is a little hard. Quite often one requires a full blown nepomuk storage service along with a dbus session running, just in order to run simple tests.
Last week, the Test Library was finally merged into nepomuk-core (4.9 and master), and I wrote a set of comprehensive tests for the Resource class, which is the most widely used class in Nepomuk.
The good or bad part (depending on your viewpoint) is that the tests were so comprehensive that only about 20% of the tests actually passed. Ad-hoc testing cannot spot all issues, and automated testing is able to find more problems that slip past anyone.
A couple of days, and about 20 commits later, we have all the important tests passing. We are making massive steps forward in improving test quality, though there's always more that can be done
For the last month I've been working on a hash table which is stored in shared memory
and can thus easily be used across applications. This is ideal for simple caches of data that reside in multiple applications. My specific use case was the Nepomuk Resource class, which is a glorified cache of key value pairs and uses a hash table. A considerable amount of effort has gone into making sure that each application's Resource classes are consistent with the other applications.
I always thought something this basic would have been implemented, but I just couldn't find a shared memory hash table which actually supported resizing.
Basic Hash Table
Hashing is arguably one of the most important concepts of computer science. If you aren't aware of how it works here are some nice links -
When implementing a hash table in shared memory, one encounters a couple of problems which normal hash tables do not have to deal with. Namely 'named memory locations'. In the Unix world each shared memory location has to be given a unique identifier so that it can be accessed by other applications. Because of that we cannot allocate each Node/Bucket independently.
Most hash tables, which handle collisions by chaining look like this -
Allocating a new named shared memory region for each node seems like quite an overkill.
Structure
Since everything has to fit inside a contiguous memory location, we need to structure the hash table a little differently.
We use two shared memory locations - HashName and HashData. This is done cause a hash table is not fixed in size, and will need to be reallocated. With the reallocations, a new named shared memory will have to be created, and all existing clients would need to be informed to use this newly allocated location which would have a different name.
Instead we use HashName as the unique identifier the client knows about, and HashData is internally used and can be changed when the hash table needs to grow in size.
Hash Name data
The additional integer is a micro optimization. Whenever a client needs to use the hash table they need to make sure that they are connected to the appropriate shared memory, and not the old version. The code does that by checking if HashData name is the same as the same as the one provided by HashName.
This results in a string comparison, which would take a certain number of cycles depending on the length of the string. We use an additional integer to indicate the if the string has changed. Integer comparisons are a lot faster than string comparisons.
Internal Data
Most of the initial members are quite obvious, but I'm still listing them.
Size: The number of elements in the hash table
Capacity: The total number of elements the hash table can hold
Invalid: The number of buckets that invalid (have been deleted)
Empty-Bucket: The offset of the next empty bucket which may be used for insertion
After this comes the array of offsets referred internally as m_buckets. This array instead of holding the pointers to the Buckets, like in a traditional hash table, it holds offsets from the beginning of the next array.
The next array is an array of Bucket s, which is internally referred to as m_data. This array holds the key value pairs. A typical Bucket is defined as -
struct Bucket {
KeyType key;
ValueType value;
int hash;
int link;
}
The link member is again not a pointer, but it contains the integer offset to the next Bucket from the start of m_data.
Insertions
Insert operations are quite simple. The key is hashed into an integer, which after a modulo operation is used as the index.
The corresponding index is checked in m_buckets. If m_buckets does not already have some value over there, there is no collision and we just allocate a new Bucket and plug in its offset.
Allocations of new buckets are done by consuming the location given by emptyBucket, and then incrementing its value.
If m_buckets does not have an empty value, we go to the corresponding Bucket, and follow its link until the link is empty. At that point we allocate a new node and make the link point to this new Bucket. This approach is almost identical to that of conventional chaining, except that we are using offsets instead of pointers.
Deletions
Delete operations are fairly similar to insertions. The index is procured by hashing they key, and performing a modulo operation with the capacity. After which m_buckets is used to get the offset of the actual Bucket. Instead of deleting the bucket, which would not be possible cause it is stored in an array, we simply mark it as invalid.
The number of invalid buckets is then updated, and m_buckets[index] is marked as empty. Later during the resize operation, the wasted memory will be cleaned up.
Note: Deletions are actually a little more complex because of collisions, but you just need to traverse the entire link chain, and mark the corresponding Bucket as invalid
Resizing
The resize operation occurs when the loadRatio of the hash table goes above a certain threshold. For now, I've set it to 0.8.
loadRatio = (invalid-buckets + size) / capacity
When the loadRatio goes above 0.8, a new shared memory location is allocated. The metadata (size, capacity, invalid and empty-Bucket) are initially copied to the new shared memory. After which the offset for each bucket is recalculated ( hash % newCapacity ), and it is inserted into the new shared memory.
The invalid buckets are ignored and they are not copied.
Once the copying is completed, the hash data key in HashName is updated, the id is incremented. This is done so that all other applications using the shared memory can update their internal pointers.
Iterating
Iterating over the hash table is probably the only operation that is a lot simpler than traditional hash tables. We already have all the data listed as an array - m_data. All we need to do is iterate over it while discarding invalid buckets which were created by delete operations.
Well, in theory.
In practice, safely iterating over a shared hash involves copying the element being accessed. Mainly because another application could have shrunk the entire memory and your index could no longer be valid.
Another possible way is to copy its contents to a QHash. I've implemented a simple function for that.
Problems
Dynamically allocated Types
You cannot use any types which dynamically allocate memory as the key or value. This rules out QString, QUrl, QVariant (kinda) and most of the commonly used Qt Data Structures. If you need to use a string as the key, you'll need to explicitly set an upper limit by using something like QVarLengthArray.
This is a huge problem, and makes using the shared hash very difficult in practice.
The Nepomuk development process has been fairly closed to outsiders. This was never a conscious decision made by the Nepomuk Team, we just never took the effort to open up the development, and make it more appealing to new comers.
For most of the last 2 years, the development model has been Sebastian and I working on our own personal list of things, and occasionally sending private emails to each other or communicating via IRC.
I'm trying to change all of this. The aim is to make Nepomuk development more open and friendlier to new developers. Getting developer feedback at the Nepomuk BOF at Akademy was the first step.
Compiling Nepomuk
One of the largest hurdles in contributing to Nepomuk was that the code resided in kdelibs. Fortunately, with 4.9, we have our own repository - nepomuk-core, where most of the Nepomuk related code has moved.
For most of Nepomuk development, you just need to compile nepomuk-core. However for some stuff like the kioslaves, controller, and kcm, you'll still need kde-runtime.
Nepomuk Tasks
Over the last week, I've created a separate mailing list for Nepomuk Bug Coordination, and I've been adding simple tasks to bugzilla (mostly from my personal TODO list).
The idea is to make the Nepomuk development process more public. Anyone can see the current list of tasks that we are working on, and which developer is working on which task (Assigned field)
Right now there are just about 20 tasks at Bugzilla. I'm in the process of splitting down the big tasks, like Fix Nepomuk Backup, into smaller chunks which individual developers can handle.
I've listed some simple tasks which do not require any knowledge of RDF, SPARQL or the ontologies.
Also, if you have any suggestions on how we can make Nepomuk development more appealing or be more open about what we are working on, please let us know.
In case you haven't been running trunk, or haven't tried out the latest release candidates (you really should), you should know that dolphin has had a lot of major improvements since the last release. One of the features I really love is better Nepomuk integration. This enhanced user visibility lands up exposing bottlenecks, and performance problems. That's exactly what happened with the nepomuk search kio slave.
Dolphin exposes this neat sidebar which allows users to list out the different kind of files - Documents, Images, Audio and Videos. A little more than a month ago, I tried it out and was quite disappointed with both the performance and the results. It was quite normal for dolphin to popup a message saying that 'nepomuksearch' kio-slave had crashed.
Needless to say, RC2 fixes most of the issues and the performance is quite good.
The Original Architecture
Step 1 - Remove threading
If you look closely you can see the large number of useless threads that are spawned in the process of one query. The kioslave spawns a separate thread in the kio_nepomuksearch process, and then the Query Service spawns another thread on which it actually runs the query and send the results back via dbus.
The thread of the nepomuk kioslave is just running an event loop waiting for it to get the results over DBus. This is especially useless, cause kioslaves can be blocking.
Step 2 - Skip the middle man
The second big bottle neck is the transmission of the query results over DBus via the queryservice, and the extra thread that is spawned cause of that. You might just ask - "What the point of the query service anyway?". Well it exists to cache query results and to provide query updates, so it isn't a complete waste.
However, the extra overhead didn't seem worth it.
The Current Architecture
The extra threading has been removed from the kioslave, and it now directly communicates with the storage service to run the queries. It seems a lot cleaner this way, and is a lot faster.
For those of you who don't know, I'm currently at Tallinn, Estonia for Akademy 2012.
Here at Akademy, yesterday was an amazing day for Nepomuk. Early morning we had a Nepomuk BOF titled 'Constructive Criticism - Prioritizing Nepomuk development', and it was really good. I wasn't sure if I shouldn't have named it such, cause that might have ended in people just coming and complaining, but surprisingly everyone was really nice and constructive.
During the start of the sprint, we decided to list down each of the Nepomuk components along with their respective developers. Once that was done, I just explained what we are currently working on, and then 'request for comments' period started.
The initial discussion was related to the PIM feeders and how they still regularly cause high cpu spikes. I don't personally use PIM, and since Christian is that maintainer of that code base, he handled most of the comments. The overall conclusion was that we need to schedule the indexing at a better time. For this, we would need some help from Solid for better idle detection.
After this we started with a discussion of the most user visible application using Nepomuk -- Dolphin. For one it should only be showing files, and the listing to the query results should be a lot faster. Maybe even something along the lines of on-demand loading. And finally, we talked about how it is not possible to search through both the file name and content at the same time.
With this discussion we obviously started discussing the current search interfaces. We wanted a separate plasmoid just for searching (something similar to Crystal), but someone mentioned that we already have 2 places for searching in KDE - krunner and kickoff, and both of them do not show the same results, which can be confusing. Marco, from Plasma, mentioned that maybe krunner could be modified to show the search results in a better way. Eventually, we concluded that this is an open discussion which still needs a proper solution.
A couple of other issues that were discussed we better technical documentation, ontologies, old data cleanup and migration. We also discussed shipping meta data writeback, which is the process of writing the Nepomuk Metadata back into files/Akonadi/some other representation.
We then discussed the state of Nepomuk and media players - Amarok, Plasma Media Center, and Bangarang, and how there may be a need for a central library for managing multimedia via Nepomuk.And finally went on to discuss the current state of Digikam and Nepomuk integration which is very old, and needs a lot of work.
And that was it. Well, not quite.
Later during the day, after the Randa sprint, we had an impromptu meeting discussing the state of meta-contacts and its support in both Telepathy and Akonadi. Fortunately, we had all interested parties present - Telepathy and Akonadi developers, and the discussion went on for quite some time.
My initial approach for solving meta-contacts using the ontologies had some flaws. Having all the interested developers discuss it out and come to a conclusion was amazing. It's difficult to get started with implementing proper Nepomuk support for Telepathy if we all cannot decide on how to use the ontologies.
We even went on to discuss semi-automatic contact merging and a lot of stuff related to Telepathy. Overall, a really productive day!
The Nepomuk team has been working really hard to fix the problems with virtuoso consuming too much memory and often just going bat crazy. And now finally we've figured it out.
It wasn't the obvious solution, but we think it's going to work out very well. From now on virtuoso won't be shown in ksysguard.
Since we won't be able to see virtuoso using up our memory and CPU, it obviously won't be doing it. Now, I get that this logic is a little brain-dead, cause in Linux we have other ways of monitoring our processes as well.
Fortunately, Trueg is in the process in patching up top, and I'm going to be contacting the kernel people to see if we can remove virtuoso PID from /proc.
Back in October 2010, I was trying to write automated tests for Nepomuk
Backup. That turned out to be a huge disaster, but the test suite was still had
some pretty good stuff in it.
Over the last month, I've finally moved it to git, cleaned it up and started
using it for real tests.
Why do we need a testing framework?
The Nepomuk Architecture is extremely decentralized. We have one central
storage service which handles virtuoso, and other different services for
monitoring files, indexing them, performing queries and so on.
These services or plugins often require other services to be running and form a
dependency chain. In order to properly test any them, we need the dependencies
to be satisfied. That's where it starts to get messy. Specially cause
Nepomuk primarily uses local sockets and DBus to communicate.
Right now we have a lot of tests that test individual classes from the services,
but nothing that tested if files are actually getting reindexed after they are
modified. Such things were always tested manually.
Now, with this testing framework, we can launch a separate KDE and DBus
session and run tests.
Another reason why we really need this is that from KDE 4.8.1
Nepomuk::Resource also uses DBus in order to write back any of its changes.
That effectively kills all of its current unit tests.
And we need unit tests!
Source Code
The code is still in my scratch repo, and there are very few tests, but we're
getting there. It has already helped me replicate a nasty PIM Feeder bug, so
Yaye! :)
I land up using wget a lot. I know there are better alternatives, but
wget's simplicity has won me over. Plus, with applications like firefox, I'm not
always sure I'll be able to continue the download. That's important when I'm
downloading big files, but for small files, it really doesn't matter.
A couple of days back, Martin and I were chatting about storing the metadata of
a downloaded file in Nepomuk. I knew it wouldn't be hard. The ontology is already in
place, so it was just a matter of pushing the data into Nepomuk.
I estimated that it would take us around half an hour to code a simple
prototype. Yesterday we finally decided to do it.
It took around an hour. :)
So, we present to you -
Nepomuk Add Download Metadata (NADM)
Yes the name is weird. In fact deciding on the name was the hardest part. The
Nepomuk code was just around 30 lines -
NADM is a simple executable which when presented with the file and download url
will attach the corresponding metadata. So, people who prefer commands can
just call it after wget, or write a script to do it automatically.
I meant to release a new version of Notably on Friday, but I got sidetracked
with some stuff. Plus, I've been spending a lot of time on designing the UI for
this release, which I think isn't a good idea. Notably is still not quite mature,
and I think right now features are more important than polish.
Last week, I showcased some tagging UIs. They aren't yet ready to be deployed
in KDE, as they need to be polished quite a bit. Plus, there is a lot scope for
collaboration when designing UIs.
Changes
Revamped UI
I've gotten rid of most of the custom KWin code. I'd initially wanted my
application to look quite different, with a blurred background and fixed size.
But that would be locking the user into a fixed interface.
Notably now looks and behaves more like a KDE application. (No more blurred
background)
Better Sidebar
Most of the code improvements have been in the sidebar, which now acts as a
proper menu and allows navigation.
Experimental Widgets
Some brand new widgets;
Tag Widget
I showcased the new Tag Widget I was working on a couple of days ago. Since
then, I've improved the code to make it more maintainable, unfortunately it
still needs a lot of work.
Tag Cloud
Creating a Tag Cloud turned out to be a greater challenge than I expected.
Right now it's implement with some basic HTML in a QTextBrowser. I'm still
experimenting with some custom layout code. Lets see how it goes.
Tag Browsing
You can browse your notes based on the tags they have been given. This will
eventually have to be expanded to allow multiple facets - like tags, dates and
so on. Implementing it on the Nepomuk side is fairly simple, but I'm not sure
about the interface.
After a couple of more releases when I've gotten most of the main features
down, I'll start on polishing it up and moving it to extragear :)
There have been cases of virtuoso going a little crazy and consuming a lot of
CPU cycles. It's extremely frustrating. However, it's ever more annoying when
you have no idea what's wrong.
Most of bug reports we get just say that virtuoso is consuming too much CPU,
and that isn't the least bit helpful. So, here is a short guide to figure out
what query is causing virtuoso to go crazy.
Listing Queries
Nepomuk contains a query service which is used to cache queries and to
execute them asynchronously. We can use it at any point to figure out which
all queries are being executed.
select distinct ?r ?v2 where { { ?r a
<http://www.semanticdesktop.org/ontologies/2007/11/01/pimo#Note> . ?r
<http://www.semanticdesktop.org/ontologies/2007/08/15/nao#created> ?v2 . }
. ?r <http://www.semanticdesktop.org/ontologies/2007/08/15/nao#userVisible>
?v1 . FILTER(?v1>0) . } ORDER BY DESC ( ?v2 )
This query is extrememly important cause without it finding the cause is
nearly impossible.
Killing queries
$ qdbus org.kde.nepomuk.services.nepomukqueryservice /nepomukqueryservice/query4 close
This will end the query
When/If you find virtuoso consuming too much cpu, list out all the queries
and close each of them one by one. The moment virtuoso gets better, you'll have
your culprit.
That's the query you should post in the bug report.
A long long time ago, a very simple tagging widget was implemented. We always
though - "Eh! This is temporary. We'll come up with a better one later." But
that never happened.
There is a lot of code in Nepomuk. However most of it is backend stuff which
does absolutely marvelous things behind the scenes - Auto duplicate merging,
type checking with respect to the ontologies, caching and lots more. We,
however, lack good UIs.
So, if you're a UI designer looking for a challenge, look at Nepomuk. We have a
lot of data.
Anyway, enough promotion! Unlike yesterday, I won't be pointing you towards the
source (though it isn't that hard to find). I'll just be showcasing some
screenshots. You'll get to try out the tagging widget and
whatever-is-in-store-for-tomorrow on Friday.
This was originally implemented with a QListView in flow mode with a custom
delegate for tags. Getting it to automatically resize was a pain, and I was
missing out on a lot of effects. Eventually, a couple of hours back, someone at
#qt pointed me towards Flow Layouts.
I'm in the process of rewriting the old item delegate code, to a widget based
one. Minus minor variations it should look the same.
As last time, if someone can make a nice mockup, I'll be more than happy to
implement it :)
Welcome to Nepomuk Tag Week! Well, not really, since it's not an official thing. I've just been working a lot with tags lately, and this week I'm going to be spamming you with some tag related updates (One for every day of the week, minus Monday)
I thought I'll start with something small - Tag Management.
We've been badly needing a UI to allow the users to modify, merge and delete their tags. You could always delete this using the conventional "Add Tag" dialog, but this way you can do batch deletes.
I'm not much of a UI designer so the interface is quite bare. I'm hoping
that someone can come up with a beautiful mockup, which I can then implement.
Prototyping is fun. You don't need to care about proper libraries. Your code can be absolutely horrible, cause "Hey! It's just a prototype!"
Yesterday, I started the process of importing my entire gTalk chat history into Nepomuk. It turned out to be a lot simpler that I thought it would be.
Step 1: Get the chat logs
GMail fortunately allows you to export your chat logs via SMTP. They don't implement the traditional XMPP-0136 for fetching offline messages. But at least, unlike Facebook, they provide a mechanism.
I landed up using getmail for importing all chat logs.
getmailrc
[retriever]
type = SimpleIMAPSSLRetriever
server = imap.gmail.com
mailboxes = ("[Gmail]/Chats",)
username = *****@gmail.com
password = ********
[destination]
type = Maildir
path = ~/Chats/
I originally wanted to use offlineimap but they seem to have a problem fetching the Chats in GMail.
Step 2: Write a parser
The chat logs are presented in a custom xml format encapsulated in the email. The content was in the traditional quoted-printable format, as most emails are. Writing a parser didn't take too long. Plus, with the new Nepomuk Datamanagement APIs, pushing them into Nepomuk was even simpler.
Ideally, this should be implemented as a strigi analyzer, so that it becomes a part of Nepomuk's Indexing framwork. But hey! It's a prototype!
What's the point of having your chat logs in Nepomuk
Well, for one, the Telepathians can use this to show chat logs. We'll obviously need a better way of importing the chat logs. Manually calling nepomuk-chat-feeder obviously isn't an option. So we'll need to find a proper way of fetching chat logs.
The second, more personal, use is that I finally have a usable dataset to determine important people in my life - based on the chat frequency and timings. AFAIK Facebook internally uses a combination of likes, comments, chat history and stalking to determine how important a person is to you, and accordingly place them higher in the auto-completion list and chat sidebar.
This obviously has many other applications like altering the chat list based on the people you converse with when you're doing one activity.
In India, we have 16 premier engineering institutes. They go by the name of IIT (Indian Institute of Technology), and are considered one of top engineering colleges in India.
In a couple of hours, I'll be flying down to IIT Madras to attend their annual technical festival: Shaastra. I've been invited to conduct a hackfest, whose primary objective would be to introduce people to the world of KDE. From what I know, there are 25 people signed up right now.
By the looks of it this 3 day hackfest (29th September - 1st October) is going to be amazingly exhausting and rewarding. If there any KDE developers in Chennai, please stop by! I could use your help.
Apart from me (KDE), we have Yuvi Panda coming on behalf of Wikimedia. Yuvi is also a GNOME developer who works on Cheese. Additionally we have a developer from Drupal, and a hardware hacker who is going to help everyone twist some wires with Arduino.
What to expect
My main focus is going to be bug fixing, so I'm on a lookout for simple bugs that can be fixed by new comers. So far I've made a list of around 15 bugs. Quite a few of them are from my note taking application, Notably. So far, I have junior jobs from Telepathy, Nepomuk, Soprano, Yakuake, Choqok(maybe), and Parley. If anyone knows any simple bugs or Junior Jobs, please let me know.
I doubt this event will be as big as conf.kde.in was. But I'm still hoping to get quite a few Indian students involved with KDE.
I've already prepared a virtual box instance which runs KDE trunk, so they should be able to skip the initial hassle of setting up a KDE Development environment.
Another week has gone by and I've decided to release another version of my favorite note taking application. Like last time the code is still in my scratch repository.
New Features
Tagging Support
I've added tagging support. So, now you can tag all your notes. I didn't want to use the traditional tagging widget, and wanted something that was more usable. So I ended up with a simple text box where you can type your tags.
Background Blur
Most of the notes I write are personal, and I don't want to be bothered by the rest of the world when I'm in my typing phase. One option is to kill both the plasma-desktop and kwin, but I didn't think the users would like that, so I've settled with just blurring the background.
Show/Hide Animation
The current default for showing and hiding the notes window is Alt + K. I've been told that it might conflict with certain menus, so maybe I should use a better default. The shortcut is obviously configurable like everything else in KDE.
With v0.1 there wasn't any animation and the window would just appear. It seemed rather odd at times. I've added a simple fade in and fade out animation when you try to show/hide the window.
I'm Still Looking For a Good Name
I'm growing kinda of fond of the name notably which was suggested by Sebas. Does anyone have any other suggestions? I want to avoid using the term Nepomuk in the name, as that is just the technology I'm using to make this note taker awesome. The users shouldn't really be bothered with those details.
Possible Keywords - Semantic and Connected
Maybe next week when I've decided on a name, I'll move it to the playground.
Note Browsing
There is still no way for you to browse the notes you've written. For now, you can use the awesome SemNotes application. Both Nepomuk Notes and SemNotes use Nepomuk as a storage backend, and since we use the same ontologies, the notes are completely compatible.
About 5 months back I started jotting down notes about stuff that was going on. Some of them were personal, and some technical. For the personal stuff I had a separate folder where I would chronologically arrange my notes. A separate file for each day. The technical ones would be saved in other random places in variably getting lost. Thinking of a unique file name and file url is hard!
About a week after I started writing a lot of notes, I decided that I needed a proper way to categorize my notes. Nepomuk obviously came to mind. But I never started with the application.
More months went by, and I desperately wanted a more meaningful way to take Notes. I wanted my note taking application to understand when I was talking about a certain individual and what the note was about.
Eventually, it struck me - I need something as simple as Yakuake. A simple shorcut which opens up a text editor, lets me write my note, and then disappears.
So, after the Desktop Summit, I took some time and wrote a very simple version of the note taker -
It’s a nice change to actually use Nepomuk for creating something instead of working on Backend stuff all the time. Hopefully, as an application developer, I’ll be able to better gauage the Nepomuk API and improve it as required.
The code is currently in a scratch repo. Once I’ve added some proper features, I’ll ask for it to be moved to playground/extragear.
Flash is annoying. Specially in Linux where it goes bat crazy at times and starts gobbling up your CPU. That’s one of the reasons why I really think HTML 5 Video tag is the way forward.
YouTube has had an HTML5 beta for quite some time. Unfortunately, I don’t like viewing videos on the YouTube player. I like the feel of my favorite media player - VLC. The great thing about the Flash videos was that they used to be cached in /tmp/Fl*. And then, Adobe changed their Flash cache directory.
Fortunately, I found this script somewhere -
#!/bin/sh
args=("$@")
args=`echo $args | sed 's/[/]$//'`
pids=`eval pgrep -f flashplayer`
for pid in $pids; do
lsoutput=$(lsof -p $pid | grep '/tmp/Flash[^ ]*')
IFS=$'\n'
for line in $lsoutput; do
lsout1=`echo $line | awk '{print "/proc/" $2 "/fd/" $4}' | sed 's/[rwu]$//'`
lsout2=`echo $line | awk '{print $9}' | awk -F '/' '{print $3}'`
if [ -n "$args" ];then
if [ -d $args ]; then
echo "Copying $lsout2 to $args/"
eval "cp $lsout1 $args/$lsout2.flv"
else
echo "The directory \"$args\" doesn't exist"
break
fi
else
echo "Copying $lsout2"
eval "cp $lsout1 $lsout2.flv"
fi
done
done
After switching to the HTML 5 Beta, I needed a new script.
#!/bin/sh
#
# A Script that runs all WebM files present in the FireFox cache with vlc
# media player.
#
# Author: Vishesh Handa <me@vhanda.in>
CACHEDIR="$HOME/.mozilla/firefox/*/Cache/"
files=`find $CACHEDIR -mtime -1 -size +1M -regex '[^_]*' \
-exec file -F ' ' {} \; | grep WebM | awk '{ print $1}'`
for f in $files; do
echo $f
vlc $f &> /dev/null
done
Today is the last day of 2 month long summer internship with Mandriva. It has been a fun ride. I was supposed to have been working on Metadata Sharing and improving Nepomuk’s infrastructure. Metadata sharing still has a long way to go, but I’m happy with how Nepomuk is turning out.
The number of bugs have been steadily decreasing and with the introduction of the Data Management Service, we’re now ready to try out more creative uses of Nepomuk. ( Yes, I’ll be talking about these more “creative” uses soon )
How many of you know that Nepomuk is an abbreviation? Oh! You knew that? Well, you have 5 seconds, can you tell me its full form? I doubt it. Unless your fingers are really fast, and you managed to open the Wikipedia article of Nepomuk in less than 5 seconds.
The ‘N’ in Nepomuk stands for Networked. And that is exactly what I’ve been working on over the last week. So far it works quite well over a local Network.
Metadata Sharing?
One of the most requested feature in Nepomuk is the ability to share the data present. Specially when you’re dealing with real world data like Projects, Events, and People.
One of the most obvious use cases that I can think of is sharing of tags - Not only file tags, but maybe even photo tags. This should be possible the moment we export the tags from digikam into Nepomuk properly.
Right now, I’m able to query for entire Desktop from my laptop. I never realized that I have so many songs and movies.
Example Code
If you’ve ever tried to query the Nepomuk Repository, you generally use the QueryServiceClient. I’ve tried to replicate a similar API.
// Construct a simple query
Nepomuk::Query::LiteralTerm lt(QLatin1String("Maroon 5"));
Nepomuk::Query::Query q( lt );
// Query all the available repositories
foreach( const QUrl& repositoryUri, repoList ) {
Nepomuk::NetworkQueryServiceClient *nqsc =
new Nepomuk::NetworkQueryServiceClient( repositoryUri, this );
connect( nqsc, SIGNAL(newEntries(QList<Nepomuk::Query::Result>)),
this, SLOT(newEntries(QList<Nepomuk::Query::Result>)) );
connect( nqsc, SIGNAL(resultCount(int)),
this, SLOT(resultCount(int)) );
nqsc->query( q );
}
Current Architecture
Nepomuk Sharing is still in its very early stages. The ontolgies aren’t even finalized. So, all of what I tell, is susceptible to change. And probably will change.
I’ve currently only implemented sharing over a local network, so that is all I’m going to be talking about.
The process is broadly divided into 3 parts:
Repository Identification
Finding other Repositories
Communicating with them
Repository Identification
We need some unique way of identifying each repository, for that we use a GUUID. Each Nepomuk Repository would contain a resource of type nso:Repository, which contains its UUID.
A Repository can belong to a certain person. So, you can query someone’s laptop/phone/sever, or just specify that person, and let Nepomuk query all the available devices.
Finding other Repositories
I chose the simplest mechanism for finding existing repositories over a local Network - DNS Service Discovery. If you already know about it, then skip the next paragraph.
ZeroConf is a set of techniques that provide 3 core technologies - Link Local addressing, multicast DNS ( hostname resolution without a DNS server ), and service discovery through DNS.
Service Discovery or DNS-SD allows us to browse what all services are available over a network. Each machine broadcasts ( actually multicasts ) the services that it provides using simple DNS records. Avahi, which is a free implementation of the ZeroConf protocol, provides DNS-SD. The Nepomuk Metadata sharing service advertises a DNS SRV type _nepomuk._tcp. This is done using KDE’s DNSSD library.
Communicating
For communicating with other repositories, I’ve implemented a simple HTTP server, which acts a slight variant of a SPARQL endpoint. Conventionally Sparql endpoints respond to requests of the form:
GET /sparql/query=EncodedQuery HTTP/1.1
In Nepomuk we encourage the use the Nepomuk Query API, which allows us to optimize the queries internally, and create them programmatically. The Nepomuk endpoint accepts requests of the form
The code is available at kde:/scratch/vhanda/nepomuk-metadata-sharing. As of posting this the HEAD is at b98205a0600908fe0e8dba49ec8fb9e78edeef5b. You might want to use that version, as I give no guarantee that I won’t completely change everything. This is still totally experimental.
You’ll need to use the “nsoRepository” branch from the Shared Desktop Ontologies.
Running
The standard cmakekde should do the trick.
In order to run the code you will need Avahi running, along with mDNS, so that you can resolve local addresses like vdek.local. Run, the service via
nepomukservicestub nepomukmetadatasharing
on each machine whose repository you want to share. You can run queries with the test app -
nepomuk-sharing-test'hasTag:Fire AND hasTag:Water'
or use the NetworkQueryServiceClient.
The Future
Over the next couple of weeks, I’m hoping to implement some privacy controls, and allow the queries to be sent over XMPP via Telepathy.
A couple of days ago, I finally jumped ship. I moved away from Wordpress. My blog looks a lot simpler now. I should probably state that I have shamelessly copied the CSS style from Ant Zucaro. I’ll get around to modifying it to suit my needs.
My last attempt to move from Wordpress, was a spectacular failure. That migration is the reason there are no blog posts between July and November 2010. This happened because I chose Jekyll, a ruby based static website generator. My ruby skills are next to non-existent. Trying to mess with code you can’t read, along with being thrust into the world of Ruby with its own packaging mechanisms and what not, can confuse the crap out of some one. That coupled with my desire to write the css from scratch, ended in a monumental disaster.
The language of my choice was Python, and I found quite a few contenders. Hyde, which is the Python equivalent of Jekyll, was the first one that caught my eye. It’s based on Django, which unfortunately still uses Python2. I really like Python 3. Eventually stumbled upon Blogofile, which suited my needs perfectly.
It’s extremely simple, and has most of the features anyone would want from a blog. It however did not offer different feeds for each tag. It took me some time, but I managed to hack my way and generate them. I still need to clean up the code, and push it upstream.
On Monday around noon, my virtual private server provider had a complete disk failure on one of the nodes containing a set of VPSs. Unfortunately, I did not have a backup, and neither did my server provider.
Yea, that sucks!
Fortunately, I’m a big believer in data redundancy and have duplicates of most of my important data on my desktop and laptop. I apparently do not consider my blog that important, cause there were no backups of it.
Since I’ve migrated my blog from wordpress.com to my own server ( July 2010 ), I’d only made 3 blog posts. Thankfully Google indexes everything. So, will a little bit of effort and some SQL, I was able to restore those 3 blogs posts along with the correct time stamps. I still haven’t managed to add the comments manually. If anyone has a nice script to do that, please let me know.
Yes, this is the reason my blog looks different! :P
This is something I showcased at my talk in conf.kde.in. I coded it in a couple of hours. It is a lot of fun to play around with! I can’t wait to index all my videos this way.
Impressive? :D
This was showcased at the beginning of the Nepomuk talk under the section “Nepomuk is not about searching, but here are some cool things that it can do!”
How does this work?
Although this looks fairly impressive, it’s quite a simple procedure. Nepomuk don’t use any fancy audio heuristic or voice recognition. It simply searches through the subtitles and plays the video at the appropriate place.
This entire project was supposed to be used with the Nepomuk Web Extractor, which aims to allow Nepomuk get additional information from the internet to annotate your files/contacts/events. But I haven’t had the time to implement a web extractor plugin, so you need to manually provide the subtitles.
The Source Code
The code can be found at my git scratch pad. It is a slightly unstable prototype that was created just to illustrate subtitle searching in Nepomuk. In order to load the subtitles you have to run
and you can run the search GUI with – nepomuk-subtitle-search.
I’m going to try my best to get this ready in time for 4.7. Ideally, this should be somehow integrated in Dolphin, but that is going to be tough. It’s high time users start seeing Nepomuk do some really cool things!
I’m going to state the most utterly insane thing I can think of – “You have not heard about conf.kde.in!“. I know, I’m crazy! But just in case you haven’t – conf.kde.in is the first Indian KDE Conference to be held in Bengaluru from the 9th – 13th March.
We have a spectacular list of talks about topics ranging from Localization to Packaging to hard core coding! Additionally we have a series of workshops planned to get you started with KDE, and, help, you discover the amazing KDE community.
From a Google Summer of Code perspective this conference is even more important as you get to meet old GSoC students, and many potential mentors. Plus, you get hands on experience in the various technologies used in KDE.
On a more personal note, I’ll be speaking about the wonders of Nepomuk, and how it is not just about searching. There is a whole world of the Semantic Desktop and its potential is just mind blowing!
Roughly 2 weeks ago, just before the feature freeze, the Nepomuk backupsync service was pushed into the trunk.
A long time ago I posted a lot of features about the ‘planned’ Nepomuk Backup service. I did land up implementing most of those features, but ultimately disabled some of them for a more stable backup experience. I thought it would be better to have a stable backup tool, without all the frills, compared to unstable tool which may potentially mess up your data.
So, without further ado I present Nepomuk Backup -
Nepomuk now provides automated backups.
The Nepomuk Backup GUI
The user interface can be invoked by the ‘nepomukbackup’ command, or can be called from the Nepomuk KCM.
**DBus interface**
The Backup GUI is completely independent of the backupsync service, as they communicate with each other through DBus. The main advantage of using DBus is that backups can easily be integrated into your existing backup solution. The command to perform a backup is -
If a blank URL is given, the backup is performed in the default location which would be `$KDEDIR/share/apps/nepomuk/backupsync/backup`.
The backup could similarly be restored using DBus, but I would recommend using the gui for restoration as it allows you to resolve conflicts.
Index everything
Honestly, do it! It’s a lot easier for you and for us. Plus you’ll have a much better Nepomuk experience if you index everything. With Nepomuk Backup, the more stuff you have indexed, the less conflicts you can expect when restoring a backup.
Nepomuk Sync
This entire post I’ve been calling it the backupsync service, and yet I’ve only been talking about backups. That’s cause there is currently no user interface for performing syncs. The infrastructure is there ( and it works! ) In the coming weeks once I’ll start working on the user-interface and stuff. Maybe I can release it in extragear? Either way, I will try my best to get Nepomuk Sync into 4.7
Everyone is out to lunch, and I’m sitting outside the conference rooms waiting for Aaron’s talk to start. I’ve still got a slight hangover from last night’s wicked party!
The day was pretty eventful, the party was freaking amazing. I didn’t attend all the talks, though. Most of the time I was sitting outside conference room #2 discussing Nepomuk and Sharing of semantic data. It was fun.
My camera seems to be rather retarded. So I don’t think I can post any pictures right now, but you could check out the Flickr group.
How time flies.. a month of the official coding period is already over, and I’m nowhere even close to saying that this part works perfectly. Most of my code is still in the requires exhaustive testing stages, but I can say that the core parts have been implemented.
In case you’ve forgotten, or didn’t read the title, I’m implementing Metadata Backup & Sync. Well I’m supposed to be implementing sharing as well, but ignore that for now. The core parts are done and I’ve started working on the user interface. Creating interfaces is something I’ve never been good at. I guess it’s more of an acquired trait. Very few people would be very good interface designers from the start.
The screenshot should say it all.
Syncing and Backup both have a common base, and work on the same principle. While syncing maps the metadata with another machine’s metadata, backup maps it to a file (Zip Archive). So, when I say the core has been implemented, I means all of the ground work has been done.
When designing backups, integrability was one of my prime concerns. I wanted people to be able to integrate metadata backup with their existing backup solutions with minimal effort. Providing just a graphical interface just wouldn’t cut it. An interface which could be seamlessly integrated with other programs was required. The most obvious solution I could think of was DBus. Performing a backup is as simple as calling a method (with no arguments) from DBus.
Another factor that was of major concern was performance. Backing up metadata needs to be fast. If it isn’t, backups will get side-tracked for more important tasks. And when a catastrophe does occur (and it will), you’ll lose important data. Currently, backing up even slightly large databases ( with over 10000 statements, worth backing up ) takes less than a couple of seconds. And that’s just for the first backup. Subsequent backups would only store the changes, and are nearly instantaneous.
Every time you perform a backup, a checkpoint is created. So if you ever want to revert your database to be previous state ( maybe you synced some data which you didn’t want to ) you can revert the Nepomuk database to any of the checkpoints. Reverting changes is non-destructive and you can even revert a revert.
Syncing works on the concept of sync files. When synchronizing with another machine, that system must create a sync file and send it to your machine. The creation of sync files is the same as creating a backup and it’s performance is nearly analogous. I haven’t yet implemented a mechanism to transfer the sync files, because that isn’t my prime concern. Maybe later, if I have time.
Restoring a backup or performing a sync isn’t that fast. Mainly because apart from finding the required files, folder, contacts, and other stuff. It even needs to decide which statements should be removed and which should be added. This is done by analyzing the time stamps. This entire process may require a certain amount of user-interaction, mainly to help determine the location of certain files. One way to reduce the amount of user-interaction required is to index most of your data. It helps a LOT!
I’ll try to keep posting updates as the project evolves. Till then if you have any suggestions about how I can better integrate backup or sync with any existing solution, please let me know.
Qt’s implementation of QMultiHash iterator is a little non-intuitive.
One would expect a multi hash to be implemented as <uniqueKey, list of values> pairs, where the list of values could even be a set. But QMultiHash is derived from QHash, and it stores the same key multiple times. It stores a number of uniquekey, value pairs.
This can cause some amount of confusion while iterating over a multi-hash with a QHashIterator.
Here is what I’d been doing (in-correct code)
QHashIterator<Key, Value> it( multiHash );
while( it.hasNext() ) {
it.next();
QList<Value> values = multiHash.values( it.key() );
foreach( const Value & v, values ) {
// Do whatever with v
}
}
This resulted in the same key being iterated a number of times, which isn’t what I wanted.
This was the correct way of doing it -
QList<Key> keys = multiHash.uniqueKeys();
foreach( const Key & k, keys ) {
QList<Value> values = multiHash.values( it.key() );
foreach( const Value & v, values ) {
// Do whatever with v
}
}
This was rather obvious once I understood the way it stores a QMultiHash.
I’m ashamed to admit it, but I’ve never really understood forward declaration in C++. I know it’s used to speed up compile times, and it helps me fix cyclic dependency issues, but I’ve never really understood it. To this date, my approach of forward declaration has been - “Forward declare most of the things, and then include header files to fix compiler warnings.” I know that’s not the best approach, and I should have read about it earlier, but …
Here are the cases where you can’t use forward declaration -
**Base Classes - **When deriving a class Der, from a base class Base. Base cannot be forward declared
**Member Variables -** If a class A is used as a member variable then it can't be forward declared (References and pointers do not count)
I knew about the Base Class condition, but not about the other one. These conditions exist because the compiler must be able to find the total size of the class from the its definition. The compiler knows the size of pointers (and references.)
Another issue that had been bugging me was the use of forward declaration with templates. After reading and experimenting this is what I figured out.
The syntax for forward declaring templates is NOT -- `template <typename T> className<T>;` It is -- `template <typename T> className;`
Templates must be forward declared with all their parameters even if some of them are optional
The template parameters follow the same forward declaration rules as another other member variables if the template is being used as a member variable. If the template variable is a pointer or a reference, you can safely forward declare its parameters.
A couple of other minor details -
You can't forward declare typedefs or enumerations
Don't ever try to forward declare any class from the *std namespace*. For one it's not allowed, and two, many commonly used classes, like std::string, are actually typedefs for other classes.
Maybe this blog post will help some poor bloke like me, who never bothered to fully understand forward declaration. :-/
Is GSoCer even a word? It seems like a Gnome version of soccer :-/
Well, this is my first post to the Planet, and I suppose an Introduction is warranted.
My name is Vishesh Handa, and I’m currently in my second year of B.Tech in Computer Science at Galgotia’s College of Engineering, India. I’ll be officially working on Nepomuk during the summer. (Yaye!) My project is labeled Metadata backup, sync and sharing. Don’t let the boring name fool you, it’s actually super interesting. Well, for me!
Metadata backup and sync are well going to be a little tricky to implement. To understand the problem, try imaging that you have a file, say SomeText.txt and you have another copy of it on another system. You alter the file on both the system such that they don’t resemble each other at all (The checksum, filename, directory, everything). The metadata has been changed, maybe you added a comment or changed its rating. How is the metadata synchronization service supposed to figure out which file on system B corresponds to the metadata which should be updated (synced) from system A.
The is where the new ontology comes into play.
The most interesting part of the project is going to be Metadata sharing. If everything goes according to plan you should be able to share your metadata. I’m super excited about it. Here is an example of what you can expect - With the Facial recognition project ( being done by Aditya Bhatt, a fellow Indian GSoCer ) you and your friend’s photos will automatically get tagged (If they have KDE!) Then you can effectively search through your friends photos to see where you’ve been tagged.
This is something I’ve been meaning to do for some time, and yesterday while debugging Nepomuk (KDE) I finally decided to get it over and done with. Why do I want to change the default prompt string? Well, there are several reasons.
Firstly, to conserve the screen width. I don’t log into other systems via SSH or anything, and hence the hostname is irrelevant to me. It was time to remove it. And secondly, to improve readability. With massive amounts of text gushing out. It gets a little difficult to read it all. Highlighting certain parts of it helps a lot.
In the end it was a simple - (this goes in the .bashrc file)
PS1='[$(tput bold)]u:w $ [$(tput sgr0)]'
export PS1
The reason I’ve used bold instead of some colour is because I tend to get bored easily. I usually change the appearance of my terminal at least once in 2 weeks. Bold seemed like a safe-bet.
Okay! So, I haven’t been posting at all. I know, I know! :-(
BUT, I made my first actual contribution to KDE a couple of days back! Hurray! :-D The last time I tried to fix a bug, it never really got accepted or rejected. The developer of KSnapshot never returned my emails, even after I pointed out that it was my first patch. This really dissuaded by from contributing to KSnapshot, and even though I knew how to fix other bugs, I never actually got down to doing it. What was the point if they never got accepted?
This time I fixed a bug in Dolphin, which is one my favorite KDE applications. Give it OBEX support, and I wouldn’t need Nautilus for anything, and could probably remove it and the entire GNOME desktop.
The bug I fixed was one I discovered myself, and I never reported it! I know, I should have, but I really wanted to solve it myself. It took my quite sometime, even though the solution was fairly obvious (Isn’t it always?)
A couple of weeks back I started learning about Qt, and then about KDE. I thought it was time I start contributing, specially since I can. KDE consists of million lines of code, and getting acquainted with such a huge system is a terribly daunting task. Fortunately, they have pretty good documentation.
One of the best ways to acquaint yourself with the huge code base is to, allegedly, fix a bug. I’ve never really fix a bug before, so it seemed like a task which would need loads of experience, proper understanding of the code, and many other things. Turns out, it isn’t that way at all. Bugs are of all types, some are simple some are sweet.
This post is what got me started. Ksnapshot seemed like a simple enough program, and it is … kinda. This blog post is about how I fixed this bug, and the thought process behind it.
Getting the code - This was fairly simple, and after reading the SVN guide at KDE, I even understand the structure of the code base. I generally tend to use Bazaar to mange my code, and I’ve heard a lot about git. Svn is a step below, so it really wasn’t a problem for me.
Compiling the code - The only small problem I encountered was when I tried to build ksnapshot instead of kdegraphics. And on trying to compile kdegraphicsI ran into loads of Nepomuk and ontology dependency problems. I fixed this by removing okular and gwenview and then running the entire cmakeprocess.
Setting up the Environment – Many of the KDE techbase articles are about setting up the entire KDE code-base for development. This consists of either creating a new user (generally kde-devel) or using scripts. I preferred the new user approach, and download the entire source code. But in the end, all of that didn’t really matter. It was just an inconvenience. So, I guess you should go with the recommended approach. I didn’t really understand any of the scripts and they didn’t do squat for me. For a small project like ksnapshot this wasn’t really required.
Now, came the actual part – Understanding the code. The usual problem I have with reading source code is that I don’t really know where to start. Every open source project is different and almost all of them are constructed differently. I don’t really have much experience reading source code – Gimp, VLC, and a couple others. All of them were largely C programs. Qt and KDE just make love C++ more and more. :-) Anyway, the ideal place I found to start in KDE apps is main.cpp, and follow the included files from there. Fortunately, ksnapshot isn’t that large just 14 files (C++ and headers) and despite what I thought earlier. You don’t need to understand the entire source code to fix one bug.
I was using KDE’s inbuilt text editor Kate for browsing the source code, with its Documents tab it was quite nifty, though it lacks stuff like go to declaration and type of variable. Nevertheless it was a simple solution. One that didn’t require me to configure anything. Just jump in and start reading.
I usually keep a pencil and some paper in front of me that way I can draw diagrams (loosely based on UML ) and jot down points to remember. Some editors have different mechanisms to add comments, but I somehow prefer a more traditional pencil and paper.
Before attempting to solve bugs in KDE, I went through some of the tutorials and some parts of the API (Here is the ultimate reference) Look at this page for a walk through about what the bug was about. No point repeating it over here.
Now I’m just going to catalogue the information I gathered - * main.cpp – Fairly standard. The only out of the norm thing (for me) was the inclusion of custom Command Line arguments, but I presume that is the norm.
ksnapshotobject –Now, this was one odd class. It initially seemed like it should have been a namespace as none of the public function actually modify any of the variables. This is not quite apparent as they weren’t declared const. Maybe they should have been static member functions? Overall this class contained 6 data members and 3 additions protected member functions. They were fairly easy to understand. autoincFilename has been a subject of bug reports earlier.
ksnapshot – This class was derived from ksnapshotobject and kdialog. Multiple inheritance! I don’t really have any experience with multiple inheritance and generally tend to avoid it (For no apparent reason!) But over here it seems to be fine. This is class is the heart of the application, and the largest.
Rest of the files – Stuff like regiongrabber, windowgrabber and snapshottimer didn’t really seem relevant to the bug I was trying to fix so, I really didn’t them bother with them much.
Coming to the actual bug solving. After finding save functions in ksnapshotobject I decided to head over to them and understand what they do. (Keep the API documentation open!) The first two overloaded versions of save were fairly obvious and both seemed to call saveEqual. Here is where it got complicated. I didn’t really know much about MIME types, apart from the basics, so when confronted by KmimeType::findByUrleven after reading the documentation I was fairly confused. KMimeType is derived from KserviceType, which is inherited by KsyscocaEntry, which made me further read a lot about the System Configuration Cache. None of this was really helping in solving the bug, though it did make me understand KDE better, which I suppose was the point!
After that I decided the skip the specifics and get the general idea. The saveEqual function just saves the file with additional checks. So, saving wasn’t really the problem. I headed over to open slots in ksnapshot.cpp. And viola! I had found the problem – Both the slots (the first one doesn’t seem to be used anywhere – Don’t take my word for it!) were opening a local file. This made me read a lot about KstandardDirs :-) They should be opening a local file if the file wasn’t already saved, otherwise they should open the saved file. Logically.
Then came the actual coding process. Fixing the bug. My thought was process was something along the line that I should check whether the file has already been saved (bool variable maybe?) and then subsequently open it or a temp file. I really didn’t want to introduce new variables as this wasn’t my code. Another thing I noticed was that fileOpen (the main variable in consideration) was a QString and later on it was being implicitly typecasted into a KUrl. I generally tend to avoid implicit type conversion, and the code stuck struck me as some what wrong. I decided to change it to a KUrl (a lot safer) and add a check. As this change was required in two functions, in an attempt to avoid code duplication, I encapsulated it in a function.
I tested it out. Fixed a couple of typos and I was done. :-) After than I proceeded to create a patch. I tend to subconsciously indent code as I’m reading it, so the patch I created (svn diff) had those changes as well. Not something I wanted. I landed up reverting the code to the original, and just making the changes to fix the bug. Still some code indentation got transferred to the patch. Anyone knows how to fix this?
After that I added the patch to the original bug report (link) and crossed my fingers and waited. After a couple of days I got tired of waiting and landed up mailing the author. (Am I too impatient?) Hopefully, my contribution will be added to ksnapshot.
The entire process was quite enlightening, and was a great learning experience. I learned a lot about KDE and Qt. I think fixing bugs is an amazing way to get acquainted with a project. Specially one with such a huge code base. :-)
For the past week I’ve been learning about the Qt Framework, mainly cause I want to start contributing to KDE, instead of just bitching about what doesn’t work, or how awesome it is! :-P
In case you’re unaware, Qt is an application framework for developing cross-platform GUI applications. TrollTech started it in 1991, and Nokia bought it in June 2008. Qt applications can even be ported to Nokia’s Symbian platform for mobiles. KDE, started in 1996, has always been based on Qt. Here are some things I just love about Qt -
C++ -It’s based on C++, not C, C++. I love C++ in comparison to other languages. It has its flaws, but I prefer it over others for doing large tasks. Otherwise there is always Python, which again Qt supports.
Signals and Slots Mechanism – This is one of the things that totally differentiates Qt, from the other frameworks, namely GTK+. Typically in other GUI frameworks implemented in C++, some form of callbacks are implemented. Which aren’t always type safe, and have difficult to use, cluttered interfaces, but Qt’s mechanism works wonders. The negative point about it is that it isn’t pure C++. Qt implemented their own Meta Object Compiler which runs before the compiler, and implements the Signals & Slots mechanism among other things. Honorary Mention - Boost Signals and Slots.
Objects –One thing I like about Java, which isn’t present in C++, is a supreme base class from which all other classes are derived. Qt implements this in C++ via QObject, and adds a further tree hierarchy to all its objects. This helps in memory management (read below), and allows easy type conversions.
Memory Management –One of the main things people dislike in C++ (I don’t - RAII) is the need for memory management. Some frameworks circumvent this by implementing their own memory management scheme, generally by overloading operator newand operator delete.Qt’s method is even better. It does nothing, but deletes all the children of every object destroyed. Therefore the root QObject is generally allocated from the stack.
Java Based iterators –STL is awesome. It really is, but it lacks in usability in certain ways. Namely deleting objects from a container. As iterators have no knowledge about the underlying implementation, they can’t know how to delete an element. ( Checkout - std::remove ) Java iterators however can. The best part is that they have both C++ and Java style iterators, which gives some features of Java, while still harnessing the powers of the Standard Algorithms.
Copy-on-Write Mechanism –The Qt programmers took the lazy approach while copying object (which is awesome!) Most of the classes in the Qt Core module are only deep copied when they are being modified, which means I can easily pass them as arguments and use them without any worries about an additional overhead. Even their own container classes are implemented that way!
QString –16-bit string support – Unicode. I know C++ provides this via wcharand wstring, but its ugly and isn’t that well used. Apart from the 16-bit strings, it even provides many of the needed string searching functions - endsWith, beginsWith, and Yes! Regular Expressions as well - QRegExp. Now I should mention that most of this, if not all, is provided by boost.
Everything C++ Should have had –Another thing I like about Java and Python, which is missing in C++, is its huge library. There is DateTime, RegExp, Networking, GUIs, URL handling, multi-threading, file-system monitoring, image handling, and many other things. All provided by Qt. Again there is boost, which provides everything stated and much more.
Container Classes –I know this is a minor thing, but I love the overloaded operator <<. It’s a lot simpler than calling push_back.</li> * Foreach –Support for the “foreach” keyword. Something C++0x should fix, but I have no idea when that’s going to come out. Maybe in 2011?
I’m still exploring Qt, so there is a lot I don’t know. But so far I’m really impressed. It really integrates everything together. I can totally see why KDE chose Qt.
Next week, after KDE 4.4 is released. I’ll start going through the KDE code. :-)
I know I’ve been posting a lot of Emacs centric posts lately, but this is the last one. For now. I had always found Emacs to be this invigorating, powerful hacker tool, which one mastered would make me a billion times more efficient. And despite everything, I still think that way, but its just not for me.
My Emacs journey was rocky. That’s the best way to describe it. Initially I was irritated by the key bindings, then I gradually came to accept them, then I started to love them, and now I hate other applications for not having them. Even though it’s been a couple of weeks since I used Emacs. I still find myself reflexively typing C-a or C-e to get to start/end of a line. Damn useful!
I have a kind of love/hate relationship with Emacs. I love its key bindings. I hate its need for configurability. Most people are shudder when they read this. People usually say configurability is the best thing about Emacs. You can make it exactly how you want it to be. You don’t like the key-bindings, change them! You hate the interface, make it the way you want it, but at what cost? The cost is the need to learn a new programming language, and it’s not that much of a big deal, when you already know a couple of languages. Emacs Lisp isn’t that hard. I learnt the basics of it in a couple of days, but that’s not the point. The point is – You shouldn’t have to, to get simple things to work. I don’t want to have to spend a couple of hours/days/weeks configuring it to work. If I want to, yes, I should be able to, but I shouldn’t have to.
I dislike the fact that I have to configure everything. Why can’t normal things be configured by default, and then additional tweaking be allowed? Simple things like line-numbering, code completion aren’t enabled by default. And getting code completion to work, is a major pain in the ass. I posted previously a couple of simple things, that I had to spend a considerable amount of time configuring. That’s a minor indicator of the amount of tweaking required. The plus point is that once I have code completion working, I can tweak it to display the completions exactly how I like it. Do I want it to display to return type? Or maybe I want it to show only the parameter names, not types. This all can be done relatively easily, once you understand how exactly Emacs works, and you understand the obscene amount of parenthesis. (Emacs Lisp)
These days, I’ve been using Qt Creator. It has an excellent Qt help section, great debugging, nice auto-completion, and even has an integrated designer. Can I configure it to suit my specific needs? Probably not. But I can live with the default configurations. The case with Emacs is the exact opposite.
It’s a kind of short term/long term gain scenario. With usual editors, the investment period is relatively low, and so are the returns. With Emacs the investment period is gigantic, but the rewards are, apparently, monumental.
For now, I’ve stopped using Emacs for C++. For the kind of stuff I’m doing there are a lot better pain-free tools available, but I feel sad that the time I spent configuring Emacs has more or less gone waste. If I’ve learned anything from this experience, it would be this - Always include good defaults, and cater to those advanced users, who would like things to be their way.
I finally started to configure Emacs, specifically for C++ and Python. And let me tell you one thing - Emacs is NOT made for newbies or people who have used graphical user interfaces their entire lives. It will be a huge, and I mean huge, learning curve. Will it be worth it? Hell, I’m still waiting to find out. And, here is another thing - If you do not know how to type properly. Learn how to do so. Emacs key bindings are difficult enough without you being a two-finger typing who continuously looks at the keyboard. I mean it. Google some typing lessons, download some software, pray to the heavens or perform a ritualistic animal sacrifice. Whatever you do - Learn how to type properly.
My initial plan to get Emacs working as a Python IDE was to jump right in i.e. act like a script-kiddie. I don’t really understand Emacs Lisp, so most of the code presented was copied blindly. Copying something without understanding does NOT go well with me. I have to understand what I’m copying, even if I don’t understand it completely I should at least have a semi-concrete/vague idea of what it does. Unless you know Emacs Lisp, most of code seems like gibberish initially.
After an hour of mindless copying scripts and adding them to the .emacs file. I was on the verge of screaming and pulling my hair. Not only did it not make any sense, but it didn’t really work. Code completion was a disaster, browsing files was a major pain in the ass, and basically everything was falling apart. Then, finally, after screaming “WHAT THE F*CK IS THIS?” a couple of times. I switched off Emacs ( C-x C-c ) and went to make myself a snack.
Ah hour later, after I had calmed down, I decided to restart my efforts. The earlier approach obviously wasn’t working, so I decided to go with a more systematic approach. I got a paper and pencil (Did I ever tell you about my fascination with mechanical pencils? I just don’t like pens!) and wrote exactly what was bothering me, and started to get those things fixed.
Appearance - Emacs22 looks horrible and the fonts are terrible! Call me shallow or whatever, but appearance matters to me. I’m not going look at some ugly half-assed interface created my people who inherently do not like graphical user interfaces. Changing the fonts seemed like a daunting task, and a quick Google search revealed something called Pretty Emacs, which I promptly installed. Emacs looks pretty good darn good now, and it uses the system fonts, which, did I mention are a pleasure to read? A couple of hours later I realized that I had Emacs22 installed while Emacs23 was available. I had no idea why I had chosen not to install it, so I promptly installed it, and to my surprise, this version of Emacs uses the system font. That makes this first point totally redundant.
The retarded scroll bar on the left - I know most Emacs users don’t keep a scroll bar as they believe using a mouse is against their religion, but I like interfaces, and I do use the mouse! (set-scroll-bar-mode 'right) This should be included by default. As far as I can remember scroll bars have always been on the right.
Line numbering - All basic editors, including gedit and kwrite, provide line numbering. The first solution I found was M-x line-number-mode, but that just switched off the L sign at bottom panel. The Emacs wiki pages contained so much info, that I was again swarming in a never-ending mist of information. Questions like okay there is this .el file, but where do I put it? Do I have to or is it included by default? If I need to include it should I use the package manager or just download it?The two options I finally found were - setnu_mode & linum_mode. The former was buggy, and awkward while the latter was exactly what I wanted. Now, the question came of how to enable this by default? I discovered that every mode has a global option which can be set in the .emacs file.
(global-linum-mode t)
Copy Paste - I’m familiar with Emacs’s concept of yanking and killing text, along with the kill-ring-buffer, but it should integrate itself with the Operating System’s copy-paste buffer. While typing my blog I used to write the entire blog post, save it, open it with kWrite and then copy it to the Wordpress new post Page. (setq x-select-enable-clipboard t)
Code browsing- Using C-x b and C-x C-b for managing files is not my idea of convenience. When you have been using Graphical User interfaces for so long, you need to be able to see the directories. Fortunately Emacs Code browser came to my rescue. Installing it was a mere - sudo aptitude install cedet ecb Then came the hard part. Learning how to use it.
The normal C-x o only seem to work when you’re on one of the auxiliary windows. Jumping from the editor to the auxiliary windows requires a lengthy C-c . g <e/d/h>.
Package management or manual download - This was another reason for my confusion. I just don’t know which philosophy to follow. A mixture of both, maybe? I’ve finally settled on manual downloads as it’s easier to keep track of all the plugins download. Plus, the repositories don’t contain all the plugins.
I don’t believe in customizing software from the start. I believe you should initially learn to use it using its default behavior and after you’re comfortable with it then start customizing it. Emacs doesn’t work that way. In order to get virtually anything to work. You need to customize it, and that involves editing a .emacs file, which isn’t provided by default, in case you’re wondering. For simple customizations it isn’t required that you know Emacs Lisp, but it’s a lot easier if you do.
I had already installed certain Plugins using the Ubuntu repositories, and the repositories aren’t exactly up to date. For example - The Emacs Code Browser package is at version 2.40, which was released in May 2009, as of today the repository provides ver 2.32. So, I removed all the installed packages, and decided to install them manually.
I decided to keep the install directory .emacs.d/ as it already existed, unpacked ecb, ya-snippet (Check out the amazing video), and cedet. This required minor configuring of my .emacs file, but by then I was getting used to it.
Code Completion - Getting this to mark was what had initially cause me to go a little berserk, but this time I had already installed all the required packages or so I thought. The Emacs wiki page has an auto-completion section. I though auto completion was a synonym for code completion, and headed over there to install it. Unfortunately, that page is a little out of date, the commands are conflicting at times, and it didn’t really work. A little googling revealed a newer version, which had none of the earlier hassles. But, this wasn’t what I was looking for. It doesn’t parse the header files and provide appropriate suggestions. It merely looks at code in the current/all buffer and suggests symbols from there. Ugh!
Eventually, I read the entire CEDET and Semantic guide, and realized that Semantic provided code-completion. The global-semantic-idle-completion-mode is nice, and so it the semantic-speedbar option. Neither of them are parsing the standard header files (for C++) or the imports from Python, but they still do parse a lot of info. So, for now I’m content. I wonder if there is a way to link this up with auto-complete.
Control and Alt key - I know it’s suggested that you swap the Caps Lock and control, and that is what I had initially done, but it started to get annoying. My left pinky was constantly alternating between the TAB, CAPS LOCK, and SHIFT key. Something had to be done! I finally configured Caps lock into Alt (or Meta), Alt into Control. This works perfectly, and my pinky isn’t getting that strained. Yet. Here is the script I used.
Undo and Redo - The default key combination for undo is C-x u or C-_. Both of which are kinda cumbersome for an operation you perform quite frequently. Then, I accidentally discovered C-/ also acts as undo. :-) Unfortunately there is no simple redo mechanism. The current redo mechanism is to undo the “undoes”. This can be done using C-x followed by the undo command, which is C-/ for me. Still, it’s kinda confusing switching back and forth between redo and undo. A better option is to install the redo package, and map it to C-Shift-/. Here is a guide.
This is all I’ve done for now. As I continue making changes to Emacs, I’ll keep posting. Maybe this will help some lost newbie who feels as confused as I do. :-
I’ve previously ranted that one of the things I didn’t like about “The Dolphin File Manager” was it’s inability to open a Konsole window. Today, I learned it can.
A simple Shift + F4 opens konsole in a new window, at the current location. Normal F4 loads a small terminal window inside Dolphin. Considering the amount of times this has been searched for and asked, I really wonder why it’s not publicized. Oh well.
Have you checked out the Google wave video? Isn’t it just amazing. The endless possibilities, right? Frankly, the collaborative editing tools were nice, but what really amazed me was the spell checker, and the char-by-char chatting mechanism. The amount of time I spend staring at the screen - ”**** is typing”. Ugh! Maybe this feature will be added to gTalk soon? Personally, I’m hoping Facebook developers will look into it, and improve their chatting mechanism or maybe they could just get their existing chatting mechanism to work flawlessly. Any of the options would make me happy. ;-)
The spell checker was a “Context based spell checker.” Just amazing! If any spell checker can automatically convert “Icland is an icland” into “Iceland is an island” then it’s freaking amazing!
But, then I went and tested it out myself. It didn’t seem that great. For one - it was kinda slow, and the “Icland is an icland” example didn’t really work. What did work was “It’s bean a long time”. Maybe it just needs some work. Wave is currently in the beta testing phase.
Anyway, the spell checker really got me thinking - Why hasn’t it been done before? How hard could it be? It turns out English is an insanely difficult language. They are formal rules, but all of them, ALL of them, have exceptions. Still, somehow most of us have no problem speaking and reading it. And if we read a sentence like “I eat some cake yesterday”, we know it’s incorrect. (In case you didn’t spot the error. “eat” should be “ate”)
The current model of spell checker consists broadly of two parts -
A routine for extracting words from the text.
An algorithm for comparing the extracted words against a known list of correctly spelled words (Dictionary)
The routine for extracting words, in English, isn’t that hard. For languages like Chinese, it is a lot tougher. The main part is the comparing algorithm. Which isn’t that simple, but is doable. GNU aspell is one example. The problem with this kind of approach is that while is guarantees spelling correctness. That doesn’t necessarily mean that the statement will make sense. A classical example is “Where have you bean?”. It’s correct according to the dictionary, but we know “bean” is an object. The correct word should have been, “been”.
Google’s method is based on what they like to call “Large Statistical Language Models”. To summarize - they have combined their search engine and a spell checker. This is something most of us do all the time.
Their Natural Language Processing Video gave me the impression that it is somewhat computationally expensive, though I’m sure I’m blatantly wrong about this. Their spelling engine apparently takes its roots from Google Translator. I really don’t know how to look at it. On one hand, the concept is amazing, but then if I spell checker can’t correct a simple ‘ot’ into a ‘to’ or even suggest it in its options. But then it is, as I said earlier, in the beta testing phase.
All this info, unleashed my geek side, and I’ve decided to write my own spell checker, using existing technologies. Open source rocks! I though I’ll use existing Natural Language Processing technologies to parse every sentence and then apply a “Parts of speech Identification algorithm” on it. Then use basic rules to check the sentence for correction. For example - English is a Subject Verb Object language. So, maybe I could check if the subject and object both are in plural/singular. Simple stuff. Maybe.
I’ll post in a couple of days to update you on how miserably I’ve failed (or succeeded.) Wish me luck.
*Anyone reading this post should be warned that I'm really sleepy right now. I have this sudden urge to type about this not so random topic. I am sleepy and you shouldn't take this post to seriously. I'm just ranting ..*
Has anyone ever realized that our Desktops in general are really static. There is nothing about it that changes that often. Okay, maybe the wallpaper. But how often does that happen? For me - It's about once in 2 weeks and even then I usually change to one of the predefined ones that came along with KDE.
What about the general theme or colour of the windows? Well, once or twice in 6 months. What about you? Do you really change how your desktop looks that often? I'm pretty sure most people don't. Hell! If they change their wallpaper once a month it's a big deal. Seriously!
So, then why do developers go into so much effort to make sure everything is so changeable. Modular and extendable are common keywords. Whats the point? Majority of the users won't really be changing the location of their widgets (Windows or Linux) that often. So, really why do developers go into so much effort to make everything so changeable? Just so that it sells?
*"Hey! Look at our desktop. We have all these crazy widgets and stuff."*
To be a little more specific, lets look at KDE's Widgets. They are absolutely fab!! So, "super kool", right? I can move them around, resize them, rotate them. All without loosing any quality. Scalable vector graphics are amazing. But how often have I actually done that? Well .. uhm .. I think it would be... Well! "Used Rarely" is how Microsoft's Add Remove Programs would label it. (Do Microsoft systems still have that?)
My KDE desktop really hasn't changed much over the months. Who has the time to actually go about looking for amazing widgets, which do all this crazy shit, which don't really help at the end of the day? This is how it looks right now -
And if I post my desktop again in a couple of months. I'm sure it would have barely changed.
Okay, I've drifted off topic. My point is - Generally people have a very static desktop. It barely changes over time. I think it's time that changed. Now, I don't mean everyone should go change their wallpaper this second, their entire color scheme, and theme. I think it should change on its own. Periodically. But that's really difficult. Everyone has distinct tastes, and we painstakingly spend hours looking for an amazing theme. Some of us, actually make them. How the hell would some program know my likes and dislikes?
My answer is - It won't. But, it sure as hell can know the weather outside. Maybe even your mood, if you tell it. So why can't our Desktop in general change to reflect our mode or the weather or maybe some holiday?
We've got all this procedurally generated content going on, at least in the field of gaming. Why doesn't someone generate an algorithm to change the entire look of the computer based on the weather, or just based on some different colours? I'm sure it won't be that hard. Or is it?
I know KDE has a weather background thingy, where it changes the desktop based on the weather outside. But it sucks! I thought it would cross-reference the weather with random images across the internet, find hundreds of them, randomly pick one and change it as a my background. Or maybe just assign 10 pictures per weather type, and do Google similar images search and randomly choose a background. Instead it just has 20 odd pictures which it keeps reusing. It gets really bland. With a normal wallpaper, you know it's not going to change unless you change it, and you live with it, but when it can change on its own and it keeps cycling through images, it irritates the crap out of you.
And here is one more thing. I know this is more of a GNOME vs KDE thing, and I know each has its pros and cons, but did the KDE people really have to make it that hard to change the entire look of the desktop? There are Icons, Colours, Window styles and loads of other stuff, and each of them has to be changed individually. I like GNOME's approach a lot better. Combine all of them, and then provide advanced customization options.
PS : Guess who's typing this in Emacs?
I’ve never really understood the hype behind Emacs (and Vi), but thats probably because I’ve never had the patience to actually use them. I always plan to learn how to, and I do. I read the introduction part, familiarize myself with the keys, but thats all that ever happens. Yesterday, I planned to start using it again, because I read an article saying “How AWESOME it is”, I even started reading The Emacs Lisp Manual.
Whenever I would read - add this to your .emacs file to get this cool feature. I would always wonder, “Where the hell is my .emacs file?” Some manuals say it’s in my $HOME directory, but I never actually found it. And trust me I searched a lot … a lot. Apparently, and I learned this yesterday, you’re supposed to create your own .emacs file initially. UGH .. why couldn’t they mention that anywhere. :-
Even though I’m somewhat familiar with the Emacs environment. I doubt I’ll actually use Emacs at all. Why?
My primary use the IDE I use (CodeBlocks) is to write games. Usually in C++ along with SDL and OpenGL, and to do that in Emacs, heres what I would have to do -
Setup CEDET - I hate programming without Code Completion. I feel like a drunk man, fumbling through the darkness. Honest! I know a couple of articles which explain how to setup CEDET, but I found it to be a somewhat laborious task. Probably cause I don’t understand Emacs Lisp.
Makefiles - I would have to setup a makefile for my game, and then manage it. Codeblocks does this for me, beautifully. And then there is the advantage that when I want to compile the game in Windows, I can use the same codeblocks file. With some minor adjustments dictating the directory of libraries and header files.
Console Window -I love the way Codeblocks always open a new console windows along with the GUI window. It really helps in finding bugs. I can output loads of info on the console window, instead of having to log it files and all. I don’t really know how to do this in Emacs (I’ve learned never to say - “That won’t be possible with Emacs”. Everything is possible with Emacs … everything.)
IDE Setup -Setting up Emacs to look like an IDE.
Until I figure out how to do all of this. I doubt I’ll ever actually switch. Hell, maybe Emacs just isn’t for me. Maybe not. Until I find out how to do all of the above. It will just be a toy I use off and on. Never for some real programming, though.
I’ve never really created a full fledged platform game. I’ve tried to, once, and I worked on it for about 6 months and I managed to create something resembling a game, if you count two characters running around many platforms with the ability to jump and bump into one another. Hehe, but then that was a long time back, and I think I’ve improved since then. At least I hope so!
Now that I’m entering the Assemblee competition, I’m giving platformers another chance, and this time I’m trying not to make my plans too grandiose. Lets see how it turns out. I have about a month to go. :-)
One problem I encountered last time was managing the animations. I don’t mean basic Sprites, those are relatively easy, a bunch of images are shown consecutively to form an animation. I mean when should which animation be played, usually in accordance to the Player’s, Agent’s or Object’s state.
I though there would be loads of information available on this topic, and a simple Google search would reveal many articles and maybe even a couple of technical papers. Unfortunately, that wasn’t the case, and I only found one article. Or maybe I’m just searching for the wrong stuff :- Either way, this is a blog post about the different Animation Managing Technique I know about, and their pros and cons. I usually refer to animations as Sprites (frame based), but that doesn’t necessarily mean so. I think skeletal animation would work perfectly well.
1. Variable Based Approach - This is when you have a number of variables each depicting some area of the Object’s state. It’s somewhat like an ad-hoc state machine, having all its flaws and, more or less, none of its strengths. So, why would anyone use this? Beats me .. I did, but then that was because I didn’t know any better.
Look at the image for a better idea. When I had implemented this I didn’t think of using vectors, and it was pretty much a huge fiasco. Most of the states mentioned would be taken care of by the physics engine / velocity vector, and some of them are extremely game centric.
The benefit of this method is that it is quite easy to implement, and seems quite intuitive. The problem is that since multiple variables can be active at one point of time, the code jumbles into a huge mess of checks and if-else statements. And the entire logic is lost in translation.
2. State Based Approach - This is the more refined way of using the “Variable Based Approach”. The basic idea is that you have a Finite State Machine, where every state has a distinct animation. My states were generally implemented via classes, and have two independent rendering and updating (some people prefer to call it tick) functions. Each state maintains its own logic for changing states. Some people prefer to implements their FSMs using a combination of enumsand switch statements. Do whatever works for you.
Pros -
Easy to implement.
Logic is clearly defined by each state.
Separates the internal logic and animation beautifully.
Cons -
The code gets somewhat lengthy.
Not really that extensible and reusable.
Add extra states required changing the previous states logic.
I usually implement every state with its own class, derived from a generic State class.
3. Data driven Approach - This is my chosen method of implementing my Animation Manager, and I think it’s quite amazing. Instead of explaining how it works. I’m going to point you to the article I found regarding it -> link. I suggest you read the forum post regarding its discussion as well.
The coding process was fairly simple and it took me just about an hour to implement the basic animation Engine. I still have to incorporate at least the basic optimizations.
Pros -
State change logic is read from a file. So, it’s totally re-usable.
Code length is relatively small. Approx 200 lines of code including file handling.
It’s been used in commercial games. So, I know for a fact there are no great hidden dangers.
I don’t really know any cons of this method, yet. Overall I think this approach is quite amazing, and I should mention it can be, apparently, used for skeletal animation systems.
Are they any other methods anyone uses for implementing Animation Mangers?
It’s just 1am over here, and for some insane reason I’m really sleepy. But I was messaging a friend (SMS not email) and it got me thinking. I love T9. Yea, that’s right T9 - predictive text. It’s just amazing. I’m one of those people who doesn’t like the SMS lingo, and prefers to type full words and sentences instead of abbreviations and word fragments! (Yea - the whole short part of SMSs are wasted on me!)
Anyway it got me thinking, that even though T9 is great. It’s really old and I think with the kind of phones we have now-days(processing power) implementing a couple of advancements should take up too much of the processors speed. I think. Here are some improvements which might be useful to freaks like me -
**Custom Word Capitalization -** When I add a custom word to my dictionary, I can choose if I always want it to be Capitalized. This would be really useful as most of the time I'm adding names of people and places.
**Letter 'I' Capitalization -** When the letter 'i' exists on it's own. It gets capitalized.
**Spell Checker **- Sometimes I don't know the exact spelling of a word, and when I start to type it incorrectly, my phone makes a loud beep to indicate that no such word exists. Maybe it should do a spell check on what I was typing and suggest a word. This would be kinda hard to implement.
**Google Wave Spell checker suggester - **A couple of months back, when I first saw the 'Google Wave' video what really impressed me was it's spell checker. It automatically identified the difference between 'been' and 'bean' based on it's usage. I think something like this could be done to make the T9 suggest words based on what you're typing instead of the it's conventional statistical technique.
I think the last two would be kinda tough to implement. Not to mention a waste of time. Still .. it’s nice to fantasize once in a while. :-)
I just rediscovered this blog I’d found a long time back. It’s called Stevey's Drunken Blog Rantsand it’s a really old blog. By really old I mean 4-5 years old - h . I had originally found it when I was googling ‘The Singleton Design Pattern’ and reading up on why it’s so bad and shouldn’t be used. EVER! (Unless you’re building a logging class.) Here is a really nice summary, in case anyone is interested. Anyway, I really love his post about Writing Blogs, particularly why everyone should write one, and that inspired me to start blogging again. Just emptying the contents of my head into a not-so-great Wordpress widget.
Coming to the problem. What problem? I just said I’ll start writing, so what kind of problem can there be? Well there is one - Categorization. I hate it! I absolutely despise it. It forces me to write about a singular topic and not just go on and on about anything and everything. In a way I suppose that it’s good most people don’t do that. It helps search engines find stuff easily, but it hinders creativity. So for now I’m just going to write about whatever I feel like and later on - maybe - try to categorize it.
*The Bluetooth Part*
So, Bluetooth Discovery,that’s what the title says, and that is what I’m going to talk about, in a way. A couple of days back, I have chatting with my friend about something (I don’t really remember) when the topic of Bluetooth adapters came up and how they are quite cheap. I always thought they would be super expensive. Okay not super expensive, but atleast kinda. Little did I know they just cost around 100 bucks. (INR) So after being amazed by this discovery, I planned to buy one for my desktop, but my friend beat me to it and got me one. It as big as a small pen-drive. Really neat!
One of the main reasons I wanted one was, so that I would be able to transfer stuff from my mobile (N73) to my comp and vice versa without having to connect it via the USB cable, which my brilliant rabbit has chewed up a little bit so it doesn’t always work. And now I can!
Once I was over the initial craze of being able to transfer stuff so easily. I started to think - “Hmmm .. what else could I do with Bluetooth”. And that’s when it hit me, that maybe … maybe I could use my mobile as a remote to control the music on my Desktop. That way I could lie down on my bed. which is about 7 feet from my comp and change the music. Yes, I know, it’s super lazy and I would maybe just use it for a couple of days or weeks. But it seems like a really interesting thing to do! And I’m sure many people would have done it before, but I don’t care!
I remember seeing a you-tube video of something similar a couple of months back, where this guy would stream in movies form his comp to the mobile via Bluetooth. I’m going to try to implement something similar to that, except that I’ll be streaming in my playlist.
**Programming Perspective**
From the looks of it I’ll probably need two applications, one which run on my desktop and the second which will run on my mobile. Now what language should I use?
Desktop Application - I can probably use any programming language I want - Python, Java, C or C++. I think I’ll most probably use Python for now, as it’s quite simple and I don’t think I need a full blown application. Java is nice, but I don’t have much experience with Java (maybe that’s why I should use it.) Later on maybe I could use C++ and create a KDE Plasma widget. I can/should probably use the D-Bus IPC interface to access amarok/other-music-player.
Mobile Application - I only have 2 options here - Java or Python. Personally I prefer Python over Java, but I don’t really know what I’ll use - so let’s see.
Yea, well that’s about it for now. Maybe tomorrow I’ll actually start working on it, instead of just thinking what it could be!
I’ve been getting kinda sick of the sluggish speed Firefox is running at. (Firefox 3.0.13 – Kubuntu 9.04) So I finally decided to try Chromium, being a major Firefox fan boy I was extremely reluctant to do so! I found this extremely simple guide at ubuntugeek. It barely took me 10 minutes to install, and the first impression was something in the terms of - “I don’t like this blue colour, why can’t it adapt to my GTK/QT theme, just like Firefox does!”. It turns out it can, but you just need to tell it to do so manually.
After the initially surge of hatred (Yes! I don’t like change, and I was determined to conclude that it is a horrible piece of trash.) I, much to my disappointment, actually started to like it. Here are a couple of things that impressed me -
**Speed - **The glorious glorious speed. WOW!!
**Theme Changing -** When changing the theme, I didn't have to restart the chromium-browser. I always have to restart Firefox, and even though it reloads all my tabs. I still don't like having to do it!
**Remember Password -** When you visit a site, enter your username/password, and click the 'login' button, Firefox instantly shows a small bar asking whether you want to save the password. Whereas chromium waits till the page has finished loading and then asks you! This seemed really logical and intuitive to me. I wonder why Firefox doesn't do it this way.
**New Tab Page - **After seeing Chromium's new tab page all I can say is that - It seems kinda dumb to have a blank page when I open a new tab, specially since this is so much more convenient.
**User Interface - **I'm not a fan of this minimalist design, but it's really starting to grow on me! And I've actually (to my regret) started to like it. Firefox's interface seems kinda cluttered in comparison.
**Tab Handling - **I like the way the tabs open and close, and the fact that when I open a new tab it is placed right next to the tab which spawned it. Similarly when I close a tab, it automatically goes to the last accessed tab. This seems like a really intuitive thing to me I wonder why Firefox doesn't do it by default ( I know it has been done via Plugins ).
So far Chromiumseems kinda amazing. I know it lacks loads of features, but HEY! it’s a work in progress. And the developer documentation seems kinda interesting as well! Not that I’ve ever bothered to read Firefox’s developer documentation, so I really shouldn’t be comparing.
Firefox is an amazing browser and I don’t think I’m in a position to leave it completely (There are too many extensions I use regularly), but ‘Chromium’ (or Google Chrome, specifics don’t really matter) is going to give it a major run for it’s money! But then that doesn’t really matter!! All that matters is that people should stop using that crappy internet explorer.
If only I could convince my parents to let go of it ..
I regularly use KDE (Kubuntu 9.10), and I prefer it over GNOME. But I still think it lacks many things -
1. File Manager - I know many die hard KDE users will swear by Konqueror, but I just don’t like it. Dolphin is really nice, and I’ve really come to like it after a while, but compared to Nautilus, it lacks the following features -
No Video Preview.
No 'Open Terminal Here' action - I know you can open the terminal for a folder or just press F4 for an embedded terminal. I just prefer the external *Konsole* window.
No Auto Preview. - Nautilus would automatically show video and picture files as thumbnails. With Dolphin I need to do it manually.
Does not directly open executables - I have to open the terminal and then do a `./name`.
Sound Preview.
2. Kickoff Application Launcher - I absolutely hate it! It’s like the windows start menu. GNOME’s overhead menu was a lot better. One of these days I think I’ll write my own KDE Widget which resembles GNOME’s menu.
3. Widget Customization - I kinda like the System Tray and the Time to be on the top instead of the bottom. Now I can do this by adding a Panel and adding those Widgets, but they look horrible. They tend to hog up loads of space and expand unnaturally.
4. KPackagekit - I feel it’s clunky and requires loads of work. For starters - It doesn’t show how heavy the updates are and it doesn’t show the speed at which they are downloading. Again the gnome-update-manager is a lot better.
Hopefully these things will be added in the future.