2013年7月5日星期五

博客迁移到Jekyll

Google Reader都关了,blogger还会远吗?

花了一个周末加四个晚上,终于把博客迁移到Jekyll上了。参考了几个网站,设计了一个简单的Jekyll模板,颜色以黑白为主,风格有点复古。没有使用工具来转换已有的日志,因为希望能够尽可能地保留格式。每一篇都是手动修改的。迁移博客真是个体力活,以后不想再做第二次了。

新的博客由GitHub Pages托管,地址为:http://magic003.github.io/。稍后会把minjiezha.com域名绑定到新的博客上,仍然可以使用http://www.minjiezha.com访问,RSS为http://www.minjiezha.com/atom.xml。

此博客停止更新。

2012年5月18日星期五

创建了Linktuned

我第一次产生创建这样一个网站的想法要追溯到一年半之前,2010的12月份。

我平时使用delicious,twitterreaditlater这三个服务来保存我认为有价值的链接。在在线书签比较火热的那段时间,我使用delicious保存书签。后来社会化网络兴起了,在twitter上读到有趣的链接,我会retweet或者favorite。近来readitlater用得比较多,因为只需要点击一下就可以保存链接实在是太方便了。可是,随着链接越来越多,当我想要找回印象中读过的某些文章,我只是零星地记得几个关键字,却搞不清我到底保存在哪里了。对于tag,我从一开始就不喜欢这个东西。因为对于同一个内容,我可能使用不同的词去描述,天知道我当时用了什么词汇。所以,我需要一个通过关键字就能帮我找回链接的工具。

我在google上搜索了一段时间,但是没有找到这样的服务。于是,我产生一股强烈的冲动,写一个网站,从各种来源收集链接并提供全文搜索的功能。我带着满腔热血向一些朋友征求了意见,却败兴而归。几乎没有人认为这是一个好主意。有人说自己没有这样的需求,根本不会去用;有人说对此兴趣不大;有人说你可以出于兴趣做一下,但是不会有太多人使用。这严重打击了我的积极性。

几乎在同一时间,Yahoo宣布将出售delicious。两天之后,我读到一篇介绍Trunkly的文章。这个网站跟我的想法如出一辙(虽然我们对于长远的计划不尽相同)。收集链接,并提供全文搜索。在它刚发布的时候几乎还是个初始版本,甚至页面中还会显示中文乱码。


我不喜欢重复别人做过的事情,刚好那段时间正忙于一些私事,所以放弃了。

将近一年之后的一个上午,当我几乎忘却这件事情的时候,一条消息让我心潮澎湃了许久:Trunkly被delicious收购了。这至少证明了我当时的想法并没有那么糟糕。经过一年的发展,Trunkly趋于完善,但是它就要在几个月后关闭了。这一次,我再也无法抑制内心的冲动。我要自己去做一个网站,把我的想法都实现出来。于是我找到了Andox,“我们来为自己写一个软件吧,即便将来没有人会用它。……每个人总想着等到一个绝佳机会再跨出那一步,可是在它出现之前,何必宁愿浪费时间而不去尝试稍逊一点的机会呢?”我就这样把他拉下水了。4个月后,当原型基本完成的时候,Steven也加入了我们。

对于在线书签的认识,我跟Trunkly是有很多共识的。上网的时候,当我发现有意思的链接时,我会在twitter中retweet或者在readitlater保存链接供以后阅读。在分享的同时我已经保存了链接。所以,没有必要先拷贝地址再保存到某个地方。保存链接应该是在浏览互联网时自然而然发生的。同时,它不是现有的在线书签或社会化网络的替代,更不应该改变用户当前的使用习惯。它只是默默地收集着用户感兴趣的链接,并很容易地把它们找回来。这是最核心的目标,所以我把我们的网站叫作Linktuned,意思是时刻关注着你的链接。

当然,我与Trunkly也存在着不同的看法。我不打算在Linktuned中建立好友或者关注的关系,至少在近阶段是如此。通常,我们假定好友之间比较容易具有相同的兴趣爱好,所以通过建立好友关系来更好地获取和传播信息。但是在我看来,用户保存的链接就已经是一个很好的资源,能够帮助我们推测用户的特征和兴趣,并且,用户已经在twitter和facebook上建立了好友关系,没有必要在这里重建一次。


单纯的在线书签服务并不是我们对于Linktuned的终极目标。我们有一些长远的计划,比如利用用户保存的链接,帮助他们找到更多有价值的信息。当然,这是很长久的事情,我们现在只会集中于当前的目标。

很高兴,我们很快就要启动这个网站了。希望它真的能给一些人带来便利。


You can read the English version from here: Why Linktuned.

2011年12月13日星期二

Domain name changed

My current minjiezha.info domain name will be expired soon. I decided to use a new one minjiezha.com, and it will take effect in 24 hours.

2011年7月28日星期四

Lazy CEK Machine

In the CEK Machine, though the substitution for an identifier is delayed, it doesn't mean it is a lazy machine. As described in Eager and Lazy Evaluation, a lazy evaluation strategy doesn't evaluate an argument before applying it to a function. However, in the CEK machine, it always evaluates the argument to a value, so it is still a call-by-value machine. If you are careful enough, you may already notice whenever an environment is created, the expression in its closure is always a value. The real power of environment is working with lazy evaluation. So in this post, I am going to implement the call-by-name and call-by-need strategies for the CEK machine.

Call by Name
It is quite easy to implement the call-by-name strategy based on the current call-by-value one. Since the machine no longer evaluates the argument before applying it to a function, the reduction rules related to applications should be modified(For a primitive operator, the operands are needed at the moment it is evaluated, so it stays unchanged.). In the call-by-value machine, when the control string is an application, first an ArgKK continuation is handled, then a FunKK continuation is handled. Now, the two steps, 2.1 and 2.2, should be combined into one, and the FunKK continuation type is removed.

2) If the control string is a value(abstraction or constant), check the type of continuation:
    2.1) If it is ArgKK, pop up the continuation, create a new environment that maps the function parameter to current closure, and set the function body and the new environment as the current expression and environment respectively;

However, the machine may need to evaluate the same argument many times now. Take expression (lambda x  + x x) ((lambda y y) 1) as an example, x is mapped to ((lambda y y) 1) after reducing the outer application. Then, when evaluating + x x, ((lambda y y) 1) is evaluated twice. The reason is that a parameter may be used several times in a function body, and whenever it is needed, the original argument it maps to should be evaluated. What if remember the result of the argument for the first time it is evaluated, so it doesn't bother to re-compute it later? I will talk this in next section.

Call by Need
The idea is quite straightforward. Remember the value for a closure, so next time the closure is needed, the value is retrieved directly. Considering the same example (lambda x + x x) ((lambda y y) 1), when the first x in + x x is needed, ((lambda y y) 1) is evaluated and the result is 1. After this, the environment which maps x to ((lambda y y) 1) is updated so x is mapped to 1. When the second x is needed, 1 is retrieved from the environment and no extra evaluation is performed.

However, the implementation is not that simple. When evaluating ((lambda y y) 1), I can't simply put it as the control string of current machine, because this may destroy the machine state and the machine can't go back to previous state after it is evaluated. Instead, a separate evaluation process should be performed on it. This is kind of a sub-machine, whose evaluation logic is the same, but it relies on some states from the parent machine, such as environments because the expression itself may contain free variables. In current evaluate() function, it initializes the machine state at the beginning and clean it up at the end. It was designed like this because no sub-machine was ever needed before. I am going to refactor it so it is fed with a machine state and focuses on the reduction logic, instead of managing the lifecycle of the machine state. One advantage is an expression can be put into a particular context for evaluation. It is more flexible. The interface is changed to:

// file: eval.c
static int _evaluate(State *state);

TreeNode * evaluate(TreeNode *expr) {
    Environment *globalEnv = buildGlobalEnvironment();
    State *state = cek_newState();
    state->closure = cek_newClosure(expr,globalEnv);

    TreeNode* result = NULL;
    if(_evaluate(state)) {
        // The result control string may contain free variables, need to 
        // substitute them using the environment for it.
        TreeNode *tmp = resolveFreeVariables(state->closure->expr,state->closure->env);
        if(tmp!=NULL) {
            // actually, tmp and state->closure->expr are the same expression
            result = tmp;
            state->closure->expr = NULL;
        }
    }
    cek_cleanup(state);
    return result;
}

And the code for looking up an identifier is:

// file: eval.c
static Closure* lookupVariable(const char *name, Environment *env) {
    if(env==NULL) return NULL;
    if(strcmp(name,env->name)==0) {
        Closure *closure = env->closure;
        if(closure->env!=NULL) {
            // evaluates the expression in its environment
            State *state = cek_newState();
            state->closure = cek_newClosure(duplicateTree(closure->expr),cek_newEnvironment("",cek_newClosure(NULL,NULL),closure->env));

            if(_evaluate(state)) {
                deleteTree(closure->expr);
                cek_deleteClosure(closure);
                // save the evaluated result
                env->closure = cek_newClosure(state->closure->expr,state->closure->env);
                state->closure->expr = NULL;
            }
            cek_cleanup(state);
        }
        return env->closure;
    }
    return lookupVariable(name,env->parent);
}

More Words about Closure and Environment
Another design improvement I want to mention is the global environment. Currently, if the control string is an identifier, the machine first tries to look it up in the environments and if not found it tries to resolve it from the builtin and standard libraries. Since a library maps identifiers to expressions, it is just like an environment. So the identifier lookup can be treated universally by introducing a global environment, which contains all the identifiers and functions provided by the evaluator. It is the root environment for all environments that are created during the evaluation and it is initialized when the machine is started. Thus, to handle a control string which is an identifier, the machine only needs to look it up in the environments. If it is a builtin or standard function, the machine can finally find it in the global environment.

A closure is an expression associated with an environment. The environment is where the expression is defined rather than it is evaluated. Let's consider the reduction steps of expression (lambda x  (lambda y (lambda x + x y) 2) x) 1.

(lambda x (lambda y (lambda x + x y) 2) x) 1
  => (lambda y (lambda x + x y) 2) x)                    A = {x=>(1,G), G}
  => (lambda x + x y) 2                                          B = {y=>(x,A), A}
  => + x y                                                               C = {x=>(2,B), B}
  => + 2 y                                                               C = {x=>(2,B), B}
  => + 2 1                                                               C = {x=>(2,B), B}  *
  => 3

The result is 3 rather than 4, because at step (*), y is mapped to x, and x is looked up in environment A, where it is defined, but not the latest environment C where is evaluated.

You can get the source code in branches call-by-name and call-by-need via github.

2011年7月23日星期六

CEK Machine

In the textual machines I mentioned in previous posts, when reducing an application, without exception, everyone requires the substitution of an argument in place of all the occurences of an identifier in the abstraction body. The machine traverses the expression tree during substitution, and if the result is not a value, it needs to re-traverse it again to evaluate it. Take (lambda x + x x) 1 as an example, + x x is traversed to replace x with 1. After that, we get result + 1 1. To evaluate it, the expression + 1 1 is traversed again. It is inefficient to traverse an expression twice. We cannot get rid of this if we stick on the substitution when reducing an application. We can, however, remember the identifier and the argument at the moment and delay the substitution until it is actually needed.

To represent this delay substitution, a component that maps an identifier to an expression is needed. It is called an environment. Every expression in the machine is replaced by a pair, whose first part is an expression probably with free variables in it and the other part is an environment. It is called a closure. Of course, the environment could not map identifiers to expressions, because the machine never works on a single expression. It should map identifiers to closures. And the continuation also contains a closure. Now, the machine state changes to a control string, an environment and a continuation, so it is called a CEK machine. Here is the data structures introduced for environment and closure:

// file: cek_machine.h
struct envStruct {
    char *name;
    struct closureStruct *closure;
    struct envStruct *parent;
};

struct closureStruct {
    TreeNode *expr;
    struct envStruct *env;
};

typedef struct continuationStruct {
    ContinuationKind tag;
    Closure * closure;
    struct continuationStruct * next;
} Continuation;

Conceptually, an environment can contain several identifier-closure mappings. However, in the definition above an environment only has one such mapping. Since mappings are created when reducing applications and an abstraction only has one parameter, I keep one mapping in a standalone environment and arrange them hierarchically. When looking up for the mapped closure of an identifier, first try it in its associated environment, and if failed, try it in the parent one.

CEK Machine
There is no significant change to continuation actions. Some new actions, like creating a closure, deleting a closure, creating an environment, looking up in an environment, are very common in the reduction steps. By the way, an identifier is no long a value because it may be mapped to an expression, which can be further reduced. Here are the reduction rules:
1) If the control string is an identifier, look up the mapped closure in the environment, and set the expression and environment of the closure as current expression and environment respectively;
2) If the control string is a value(abstraction or constant), check the type of continuation:
    2.1) If it is FunKK, pop up the continuation, create a new environment that maps the function parameter to current closure, and set the function body and the new environment as the current expression and environment respectively;
    2.2) If it is ArgKK, set the continuation as FunKK, and switch the current expression and environment with these in the closure of continuation respectively;
    2.3) If it is OprKK, pop up the continuation, evaluate the primitive expression, and set the result as the current expression and set the current environment to null;
    2.4) If it is OpdKK, set next argument expression and its environment as current expression and environment respectively;
3) If the control string is an application, create and push a continuation of type ArgKK, whose closure consists of the argument expression and current environment, and set the abstractioin expression as current expression, with current environment unchanged;
4) If the control string is a primitive, create and push a continuation of type OpdKK, whose closure consists of the primitive expression and current environment, and set the first argument expression as current expression, with current environment unchanged.

Though implementation is straightforward according to the reduction rules, the code is getting complicated, so I am not duplicate it here. You can still access it on github.

A drawback of the CEK machine is no intermediate expression can be obtained after each step. What a pity! The machine always delays the substitution and expression is not actually reduced after each step. It does less work at the expense of losing immediate effects.