glusterfs 中的字典查询

　　glusterfs文件系统是一个分布式的文件系统，但是与很多分布式文件系统不一样，它没有元数服务器，听说swift上也是应用了这个技术的。glusterfs中每个xlator的配置信息都是用dict进行管理的。dict这玩意儿，说白了就是一个hash表，是一个key/value的内存数据库。今天花了点时间慢慢研究了glusterfs中的设计，觉得还是挺有意思的。

　　上篇博客介绍了glusterfs文件系统的内存池的设计，而glusterfs的内存池正应用在这项技术上。首先，glusterfsd在程序初始化时，就建立了三个池dict_pool、dict_pair_pool、dict_data_pool。接下来看看它是怎么玩这三个内存池的呢！

　　1、在使用dict之前，首先是建立dict对象，这点是面向对象的思想吧。

 dict_t *
 get_new_dict (void)
 {
         return get_new_dict_full ();
 }

　　glusterfs调用get_new_dict来建立一个dict对象，接下来看看get_new_dict又做了什么呢？

 dict_t *
 get_new_dict_full (int size_hint)
 {
         dict_t *dict = mem_get0 (THIS->ctx->dict_pool);         if (!dict) {
                 return NULL;
         }         dict->hash_size = size_hint;
         if (size_hint == ) {
                 /*
                  * This is the only case we ever see currently.  If we ever
                  * need to support resizing the hash table, the resize function
                  * will have to take into account the possibility that
                  * "members" is not separately allocated (i.e. don't just call
                  * realloc() blindly.
                  */
                 dict->members = &dict->members_internal;
         }
         else {
                 /*
                  * We actually need to allocate space for size_hint *pointers*
                  * but we actually allocate space for one *structure*.  Since
                  * a data_pair_t consists of five pointers, we're wasting four
                  * pointers' worth for N=1, and will overrun what we allocated
                  * for N>5.  If anybody ever starts using size_hint, we'll need
                  * to fix this.
                  */
                 GF_ASSERT (size_hint <=
                            (sizeof(data_pair_t) / sizeof(data_pair_t *)));
                 dict->members = mem_get0 (THIS->ctx->dict_pair_pool);
                 if (!dict->members) {
                         mem_put (dict);
                         return NULL;
                 }
         }         LOCK_INIT (&dict->lock);         return dict;
 }

　　size_hint是要分配的字典的大小。当 size_hint为1时，字典内的数据将是一个链表（用链表解决HASH冲突问题）。

　　接下来看看程序又将是如何向字典中添加一项数据的呢？首先还是来看看dict_t 的数据结构吧：

 struct _dict {
         unsigned char   is_static:;
         int32_t         hash_size;
         int32_t         count;
         int32_t         refcount;
         data_pair_t   **members;
         data_pair_t    *members_list;
         char           *extra_free;
         char           *extra_stdfree;
         gf_lock_t       lock;
         data_pair_t    *members_internal;
         data_pair_t     free_pair;
         gf_boolean_t    free_pair_in_use;
 };

在dict_t中有一个lock子成员，每次操作dict_t对象时，首先要对它进行加锁：

int32_t
dict_add (dict_t *this, char *key, data_t *value)
{
        int32_t ret;        if (!this || !value) {
                gf_log_callingfn ("dict", GF_LOG_WARNING,
                                  "!this || !value for key=%s", key);
                return -;
        }        LOCK (&this->lock);        ret = _dict_set (this, key, value, );        UNLOCK (&this->lock);        return ret;
}

　　不得不说glusterfs的编码风格还是挺漂亮的，它把一些细节与核心点分的很清楚，代码看上去那个爽啊！！看上面的代码：打日志与加锁放一在一块，核心的处理将在_dict_set中处理。

 static int32_t
 _dict_set (dict_t *this, char *key, data_t *value, gf_boolean_t replace)
 {
         int hashval;
         data_pair_t *pair;
         char key_free = ;
         int tmp = ;
         int ret = ;         if (!key) {
                 ret = gf_asprintf (&key, "ref:%p", value);
                 if (- == ret) {
                         gf_log ("dict", GF_LOG_WARNING, "asprintf failed %s", key);
                         return -;
                 }
                 key_free = ;
         }         tmp = SuperFastHash (key, strlen (key));
         hashval = (tmp % this->hash_size);         /* Search for a existing key if 'replace' is asked for */
         if (replace) {
                 pair = _dict_lookup (this, key);                 if (pair) {
                         data_t *unref_data = pair->value;
                         pair->value = data_ref (value);
                         data_unref (unref_data);
                         if (key_free)
                                 GF_FREE (key);
                         /* Indicates duplicate key */
                         return ;
                 }
         }         if (this->free_pair_in_use) {
                 pair = mem_get0 (THIS->ctx->dict_pair_pool);
                 if (!pair) {
                         if (key_free)
                                 GF_FREE (key);
                         return -;
                 }
         }
         else {
                 pair = &this->free_pair;
                 this->free_pair_in_use = _gf_true;
         }         if (key_free) {
                 /* It's ours.  Use it. */
                 pair->key = key;
                 key_free = ;
         }
         else {
                 pair->key = (char *) GF_CALLOC (, strlen (key) + ,
                                                 gf_common_mt_char);
                 if (!pair->key) {
                         if (pair == &this->free_pair) {
                                 this->free_pair_in_use = _gf_false;
                         }
                         else {
                                 mem_put (pair);
                         }
                         return -;
                 }
                 strcpy (pair->key, key);
         }
         pair->value = data_ref (value);         pair->hash_next = this->members[hashval];
         this->members[hashval] = pair;         pair->next = this->members_list;
         pair->prev = NULL;
         if (this->members_list)
                 this->members_list->prev = pair;
         this->members_list = pair;
         this->count++;         if (key_free)
                 GF_FREE (key);
         return ;
 }

　　19行利用HASH算法计算HASH值，20行缩小HASH值的范围，23行到了35行为替换处理。37-48行是让我最难受的代码，这个地方不知道是不是设计的问题。55行之后是插入新的HASH键值的操作。

　　再看看查询的操作吧。

 data_t *
 dict_get (dict_t *this, char *key)
 {
         data_pair_t *pair;         if (!this || !key) {
                 gf_log_callingfn ("dict", GF_LOG_INFO,
                                   "!this || key=%s", (key) ? key : "()");
                 return NULL;
         }         LOCK (&this->lock);         pair = _dict_lookup (this, key);         UNLOCK (&this->lock);         if (pair)
                 return pair->value;         return NULL;
 }

同样是先处理锁之类的杂项操作，_dict_lookup才是真正的始作俑者。

 static data_pair_t *
 _dict_lookup (dict_t *this, char *key)
 {
         if (!this || !key) {
                 gf_log_callingfn ("dict", GF_LOG_WARNING,
                                   "!this || !key (%s)", key);
                 return NULL;
         }         int hashval = SuperFastHash (key, strlen (key)) % this->hash_size;
         data_pair_t *pair;         for (pair = this->members[hashval]; pair != NULL; pair = pair->hash_next) {
                 if (pair->key && !strcmp (pair->key, key))
                         return pair;
         }         return NULL;
 }

　　查询的代码相当的简单吧，计算一个哈希值，再查询一个链表就OK了。

　　查看了glusterfs中的所有代码，glusterfs_new_dict_full调用时几乎都是传入参数1，只有dict_copy接口比较特别：

 dict_t *
 dict_copy (dict_t *dict,
            dict_t *new)
 {
         if (!dict) {
                 gf_log_callingfn ("dict", GF_LOG_WARNING, "dict is NULL");
                 return NULL;
         }         if (!new)
                 new = get_new_dict_full (dict->hash_size);         dict_foreach (dict, _copy, new);         return new;
 }

从代码上看，只有此处才发挥了HASH表的作用，其它的都只是把dict_t当成链表来使用。而且这个地方也并不是用HASH表的思想，只是把一个链表转换成了HASH表。这个是我在glusterfs中见到的一处最不明智的地方。