Amarok2中的MySQL-真相(渣翻译)

原文链接:http://amarok.kde.org/blog/archives/812-MySQL-in-Amarok-2-The-Reality.html

08年的老文,看有人聊到KDE里面的MySQL用的很多时搜索了一下,结果发现了这篇文章。读完之后,突然想翻译一下了。看到最后感觉感触颇深啊,真的,开发过一些小玩意,面对着或多或少的真正用户,心有戚戚焉,虽然这么讲也许有点妄自尊大,可是还是忍不住,想要做出更好的东西给大家……翻译完了之后略修改了一下,结果发现有很多看起来很囧的错误……有不少地方是意译,欢迎指出错误。每段都附上原文对照。

最近关于Amarok将MySQL作为唯一SQL后端这件事上有许多讨论。相当的一部分是对于不确定的恐惧——有些仅仅是人们对于改变的抗拒,有些仅仅是不理解这个决定。有些讨论(特别是Adriaan的博文)很有意思,并且有着深刻见解,但是却忽略了为什么做出了这些改变。本文尝试去解释为什么做出了这个决定,这个决定对于一个最终用户意味着什么,为什么你应该喝口茶歇一歇。

There has been a lot of chatter lately regarding Amarok’s switch to MySQL as its only SQL backend. A decent amount is FUD — either by people simply pushing back against change, or by people that simply don’t understand the decision. Some of it (particularly Adriaan’s blog post) has been insightful and interesting, but miss the mark in terms of why this change was made. This post attempts to explain why this decision was made, what it really means for you the end-user, and why you should have a cup of tea and relax.

我想首先指出的是,我说过MySQL将会成为Amarok的唯一SQL后端。Amarok2的收藏系统很强大。看看有多少各种各样的音乐源吧,来自Shoutcast的,Jamendo的,Magnatune的,Ampache的,MP3Tunes的,还有来自本地的,比如来自iPod和本地的文件系统,它们在Amarok2当中都被同等对待了。Amarok1的收藏仅仅是个收藏,并且被它自己所声明的功能限制了(当然,他能提供它自己的特定功能)。虽然我不这么想,但是同时还有一个基于Nepomuk的收藏选项没有被启用。(注:这个i don’t think看的我一头雾水)所以振作一点,这个变化只是影响Amarok的内部SQL收藏,但不包括其他来源(尽管如果那些来源乐意,他们也可以把信息存储在SQL数据库中)。

I want to point out first that I said that MySQL is going to be Amarok’s only SQL backend. A2’s collection system is very powerful. Just take a look at how varied music sources from Shoutcast, Jamendo, Magnatune, Ampache, MP3Tunes, as well as local sources like iPods and your local file system, are treated as equals in A2. A collection is a collection, and is limited only by what capabilities it advertises it can support (and of course, it can supply its own custom capabilities). It’s not currently enabled, I don’t think, but there’s a Nepomuk-based collection option too. So take heart — this change only affects Amarok’s internal SQL collection, and not other sources (although those sources can store information in the SQL database if they wish to cache information).

由于我提到了Nepomuk,我们也来谈谈另一个普遍的问题/需求/抱怨:KDE已经有了很好的Strigi-Nepomuk在工作,为什么我们不用它来搜索音乐并且存储信息?这有很多原因。第一个原因是Strigi和Nepomuk是可选的,并非必要的。(更新:Strigi是必要的,但Soprano不是,所以Nepomuk从整体上来说还是可选的。)我们不能指望用户安装了这些,而且就算他们安装了,我们不能指望用户正确地配置了他们(注意由于我们的计划是跨平台,这使得这件事更不大可能了)。第二个原因是速度:Amarok的自定义收藏扫描的速度非常快,并且利用taglib收集特定的信息。和Strigi比较起来,Strigi的速度就太慢了(它需要计算所有文件的哈希值,这意味着它将会读取整个文件),并且取得较少的信息。(更新:照Strigi开发者的说法,而不是在kde-apps.org, 维基百科,甚至在作者自己主页上的描述,Strigi默认并不计算文件的哈希值。这使得对于Strigi来说,如果配置得当的话,和Amarok的内置扫描相比能够一样快。尽管我不知道它是否将所有需要的信息都提取出来。但如果它配置成需要计算所有文件的SHA1哈希值,那么它肯定会非常慢。)在本地硬盘上,这并不是一个大问题,但是当你将基于网络的存储纳入考虑范围内的时候,这就是个大问题了,而且这是一个非常普通的场景。同时尽管我不记得细节了,但我也听说,和一般的SQL数据库相比,对Nepomuk的查询是很慢的。同时记得当基于Nepomuk的收藏完成的时候,来源于基于Nepomuk的收藏将会把它们的元数据修改存回Nepomuk当中。所以SQL收藏并不是Nepomuk的替代——它们是完全独立的。(更新:我忘记提到基于Nepomuk的收藏已经存在了。它已经由GSoCer在暑假开发完成。我不确定它的状态是否可以赶上2.0的发布,但是我们这些Amaroker都很喜欢Strigi/Nepomuk,并且对于打开应用程序之后所有音乐都正确可用,并且不需要任何预先配置这个点子感到兴奋。但是SQL收藏也确实占有一席之地。正如我说的:它们都是很好的技术。)

Since I mentioned Nepomuk, it’s time to discuss another common question/demand/complaint: KDE has this nice Strigi-Nepomuk thing going on…why aren’t we using it for scanning music and storing information? There are a couple main reasons. The first is that Strigi and Nepomuk are optional, not required. (Update: Strigi is required, but Soprano isn’t, so Nepomuk as a whole is still optional.) We can’t rely on the user installing them, and even if they are installed, we can’t rely on the user to configure them properly (remember that we’re going cross-platform, making it even less likely). The second reason is speed: Amarok’s custom collection scanner is extremely fast and pulls out specific pieces of information with TagLib. Strigi is, by comparison, very slow (it calculates hashes of all files, which means it needs to read the entire file) and pulls out less information. (Update: According to the Strigi developer, and despite what is said on kde-apps.org, Wikipedia, and even the author’s own home page, it does not calculate hashes by default. So it’s possible that Strigi, if properly configured, could be as fast as Amarok’s internal scanner, although whether it would pull out all necessary information, I don’t know. If it’s configured to calculate SHA1 hashes of all files, then it will indeed be far slower.) On a local hard drive, it may not be a big issue, but it sure is a huge issue when you throw networked storage into the picture, which is a very common scenario. I’ve also heard, though don’t remember specifics, that querying and such through Nepomuk is rather slow, compared to a normal SQL database.  Regardless, though, remember that when the Nepomuk-based collection is finished, tracks sourced through a Nepomuk-based collection will have their metadata changes saved back to Nepomuk. So, it’s not that the SQL collection is in place of Nepomuk — they are entirely independent. (Update: I forgot to mention that a Nepomuk collection already exists. It was developed by a GSoCer over the summer. I’m not sure what its status is as far as making the 2.0 release, but we Amarokers both like Strigi/Nepomuk and are excited about the idea of opening up the app and having all your music available right then and there with no pre-configuration. But there is a place for the SQL collection too. As I said: they are complimentary technologies.)

我们已经跑题了,现在回到正题上。

With those topics out of the way, on to the meat.

首先,理解两个关键的事实是很重要的。第一,我们不是搞数据库的。是的,我们可以把数据存进数据库,并且或多或少的提供一个工作的模式(注:Schema,因为平日和别人讨论都用这个词的英文(注之注:我是搞数据库的)……我一直纠结于要不要直接用Schema,后来还是用了模式这个词),但是我们当中没有一个是专家/天才/绝地武士/等等。这导致了第二个事实:维护三个数据库快把我们整疯了。每次一个很小的模式变动都需要对三类数据库进行编码。对于模式的修改对于一个数据库可能就是些琐碎的小事,对于另一个就可能非常困难(甚至不可能)。人们会回报那些我们无法重复的Bug,最后仅仅发现因为我们不能理解这些数据库是如何具体工作的(或者在另一些情况中,没有任何一个活跃的开发者使用那个类型的数据库)。还有许多类似的事情。所以在Amarok2开发之初(在Amarok1开发时的幻想中),我们就知道我们只想要一个数据库。

First, it is important to understand an important pair of facts. Number one: we are not database guys. Sure, we can store data in them, and more or less come up with a working schema, but none of us are gurus/wizards/jedis/etc. This leads in to number two: maintaining three databases was driving us crazy. Every time a minor schema change was needed, it had to be coded up for all three types of databases. Modifying a schema could be trivial for one database type, and super difficult (or impossible) for another. People would report bugs that we couldn’t reproduce, only to find out that it was because we didn’t quite understand how one database or another behaved (or in some cases, none of the active devs were using that type). And so on. So from the beginning of A2 development (and in our fantasies during A1 development) we knew we wanted just one database.

(我们确实看了那些抽象层,例如QtSQL等等。我不想过多的评论他们,因为我并没有对他们进行评估,但是总的说来如果不进行一些特定的SQL编程,他们并不灵活到足以应付我们所有需求(特别是在一些修改模式的任务上),这导致我们没有采用他们。如果你确实想知道更多信息或者坚持认为他们足够应付任务,问问eean,我想他对他们做了评估。)

(We did actually look at abstraction layers like QtSQL and others. I’m not going to comment on them much, as I didn’t do the evaluation, but in general they were found to not be flexible enough to handle all of our needs without doing some custom SQL coding (especially in the cases of things like schema changes), which kind of defeats the point. If you want to know more/want to insist that they are, try asking eean, as I think he did the evaluations.)

现在我们必须选择一个数据库。第一眼看来,SQLite是个不错的选择。使用事务的话,他的速度很快。十分稳定(那些抱怨奇怪的MySQL的bug的人应该和markey谈谈,他是1.4的SQLite后端维护者,可以证实SQLite有他的一席之地)。但是还是有一些问题把它踢出局了。第一个问题是性能。尽管对只有一个小收藏的人来说它工作的很好,有着大量收藏的的人在换到MySQL或者PostgreSQL之后汇报了他们在进行复杂或者很多查询的时候获得了很大的速度提升,例如把很多项目加入播放列表的时候,扫描文件的时候,以及对收藏进行搜索/过滤的时候。由于我们想要调和那些有着大收藏和小收藏的人的需求,并且数字音乐收藏的规模并没有缩小的趋势,有着大收藏的人的数量增长速度是很重要的。许多我们的开发者,在切换到mysqle(我们这么叫它,尽管这不是官方名称)之后,在每日的Amarok2使用中,注意到了巨大的速度提升,因此嵌入式服务器和一般的服务器相比也能带来速度提升。这是对SQLite的第一拳。

Now we had to choose the type. At first, SQLite seemed like a good choice. Using transactions, it’s decently fast. It’s pretty stable (those that complain about odd MySQL bugs should talk to markey, as he, being the SQLite maintainer in 1.4, can attest that SQLite’s had its fair share). However, there were a few problems that in the end knocked it out of the running. The first problem is performance. Although for people with small collections it performs fairly well, people with large collections that switched to the MySQL or PostgreSQL backends in A1 would report enormous speed gains when operations performing complex or many queries were performed, such as adding many entries to the playlist, scanning files, or filtering/searching in the collection. Since we want to accommodate users with large collections just as well as those with smaller collections, and since digital music collections aren’t getting smaller, the speed increase for our users with large collections was quite important. Many of our developers, after the switch to mysqle (as we call it, though that’s not the official name), have noticed huge speed increases in their day-to-day use of A2, so that speed increase is carrying through to the embedded server as well as the normal server. That was the first knock against SQLite.

另一个使我们不采用SQLite的原因是完全不同的。许多用户(包括我自己在内)有许多电脑,但是却只有一个Amarok数据库。假定所有电脑都可以通过一个挂载地点访问音乐(其他东西也配置正确),这使得你可以只扫描一次,但能在其他所有地方播放,无论在那里播放音乐都可以对同一个评分进行更新,还有其他很多事。就算你并不在多台电脑间共享数据库,许多用户由于速度,安全性,和备份的原因,也想要把数据库存储在一个特定的服务器上。如果你认为这并不是一个通常的用例,那么你就错了。MySQL和PostgreSQL对于这样的工作量工作的很好。对于SQLite就行不通了,因为它是为了一个不同的目的而设计的。因此SQLite被两击直拳重击,KO。

The other blow for SQLite came for a totally different reason. Many users (myself included) have multiple computers sharing a single Amarok database. Assuming all the computers have access to the music at the same mount point (and a few other things are configured right), this allows you to scan once, play everywhere, update the same ratings no matter where you play it, and more. Even if your aren’t sharing the database among multiple computers, many users want their database stored on a particular server for speed, security, or backup reasons. If you think either of these isn’t a common use-case, you’d be quite wrong. MySQL and PostrgreSQL were quite happy with this workload. It’s a total no-go for SQLite, simply because it’s designed for a different purpose. So SQLite had two big knocks against it. K.O.

但是,正如我们不能指望用户正确的设置了Strigi/Nepomuk那样,我们不能指望他们设置好了MySQL和PostgreSQL的数据库的表。因此我们需要数据库能够嵌入式的运作,这样它就能够在不做任何其他设置的情况下工作了。MySQL,和libmysqld,在4.1系列当中有着初步的支持,在5.0当中它就工作的很好了,并且在5.1中就被完全支持(据我所知)。PostgreSQL,相比较而已,没有这种功能。(但他有一个他们自己的有趣的嵌入式SQL的概念。更新:显然这现在成为了SQL标准的一部分。还是很酷。还是和我们认为的我们刚才谈到的嵌入式服务器不是一个东西。)

However, just as we can’t rely on the user to set up Strigi/Nepomuk correctly, we can’t rely on them to get their tables set up in MySQL or PostgreSQL. So we needed the database to be embeddable, so that it could just work for the user without any setup necessary on their part. MySQL, with libmysqld, had the seeds of this in the 4.1 series, it works decently in 5.0, and it’s becoming fully supported (AFAIK) in 5.1. PostgreSQL, on the other hand, does not have any such thing. (They have an interesting and cool concept of their own of embedded SQL though. Update: apparently that is part of the SQL standard. Still pretty cool. Still totally different from what we mean when we are talking about an embedded server.)

这使我们只剩下一种选择——正如你所猜想的那样——MySQL。它也许不是某些人最喜欢的数据库(尽管它是大多数人的),而且我并不了解究竟在运行嵌入式模式的时候多少真正的额外开销,但它确实完全符合要求。它同时可以按嵌入式在本地运行和或者用独立模式在其他的机器上运行(是的,这还没有在Amarok2当中被支持,但这会被支持的)。对于大收藏来说,它的速度很快,也很健壮。它被开发组的人员充分了解。最重要的,它可以作为唯一的后端解决我们所有的需求。

So this leaves us with — as you guessed — MySQL. It may not be any particular person’s favorite database (although it is for plenty), and I don’t know how much overhead it really has in embedded form, but it fit the bill. It’s both embeddable and can run standalone on the local or a separate machine (yes, this is not supported yet in A2, but it will be). It is fast and robust for large collections. It is well understood by the development team. And most of all, it is a single-backend solution that fills all of our needs.

如果你还是对于我们决定感到不愉快,我得说声抱歉。我们尝试去满足大多数人,但是没法满足所有人。但是是我们在开发并支持它,所以我们才基于我们开发者的需求,和来自成千上万在过去几年和我们沟通联系的用户共同的现实用例做出了这个决定。请记得就算大部分在Dot上的留言,或者在这篇文章后面的评论(例如很多突然出现的回应)是来自于对这个决定不满的人,这个决定事实上还是适合大多数人的,和我们其他选择相比,它是对于大多数我们的用户来说更好的决定。

If you’re still unhappy about our decision, I’m sorry. We try to please most and can’t please everyone. But we’re the ones that develop and support this thing, and so we made a decision based both upon our needs as developers and the real-world use-cases from the collective feedback of thousands of users that have contacted us over the last few years. Please remember that even if most of the comments on the Dot, or to this post, (i.e. much of the sudden visible feedback) are from people that are unhappy with our decision, it is a decision that will actually suit the vast, vast majority of our users better than the other options we currently have.

我们这个项目众所周知对用户很友好——我们倾听他们,尽量实现他们想要的特性,尽量响应与支持。这是使我们走到今天的一个因素。所以拜托,亲爱的读者们——请相信我们。这并不是一个简单的决定——我们讨论过,争论过,扔过东西,和解了,我们在和解后进行了一两次狂欢——但最终我们全体都认为这是一条正确的路,并且我们认为,终究,这将使得Amarok更好。希望你们也这么认为。

We’re a project that is known for being good to our users — we listen to them, we try to implement features they want, try to be responsive with support. It’s one of the things that got us where we are today. So please, dear readers — put some faith in us. This has not been an easy decision — we’ve discussed, we’ve argued, we’ve thrown things, we’ve made up, we’ve had an after-the-make-up orgy or two — but in the end it’s what we collectively felt was the right way to go, and we feel that, in the long run, it will make Amarok even mores awesomer. Hopefully you’ll feel that way too.

This entry was posted in KDE and tagged , . Bookmark the permalink.

12 Responses to Amarok2中的MySQL-真相(渣翻译)

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.