Flyweight

Flyweight, 有时也称为token或cookie, 是一种临时性的组件, 它起着智能引用smart reference的作用. 它通常是在当存在大量十分相似的对象时, 节省内存的一种手段.

11.1 User Names

假定你有一个大型多人在线游戏, 相信会有很多人叫John Smith. 因此, 如果我们用ASCII码一个个记录他们的名字, 每个用户会消耗11个字节. 相反, 我们可以存储John Smith一次, 然后为每个人存储一个指向这个名字的指针. 这样就只需要8个字节.

更进一步, 我们还可以将John Smith再一次拆分为两部分分别保存.

typedef uint32_t key;

struct User
{
    User(const string& first_name, const string& last_name)
        : first_name{ add(first_name)}, last_name{ add(last_name)} {}

    ...
protected:
    key first_name, last_name;
    static bimap<key, string&> names;
    static key seed;
    static key add(const string& s) {...}
}

下面是add的实现:

static key add(const string& s)
{
    auto it = names.right.find(s);
    if( it == names.right.end())
    {
        // s不在names中, 加进去
        names.insert(++seed, s);
        return seed;
    }
    return it->second;
}

上面的代码使用了boost::bimap. 这是一个和标准的get-or-add的实现机制.

下面是获取实际的名字的接口:

const string& get_first_name() const
{
    return names.left.find(last_name)->second;
}

const string& get_last_name() const
{
    return names.left.find(last_name)->second;
}

11.2 Boost.Flyweight

在前面的例子中，我们手撸了代码. 而Boost中提供了一个可用的库: boost::flyweight. 我们使用它来重写上面的例子:

struct User2
{
    flyweight<string> first_name, last_name;
    
    User(const string& first_name, const string& last_name)
        : first_name{first_name}, last_name{last_name} {}

}

而可以这样使用它:

User2 john_doe {"John", "Doe"};
User2 jane_doe{ "Jane", "Doe"};

cout << boolalpha 
     << (&jane_doe.last_name.get()==&john_doe.last_name.get());    
     // true

11.3 String Ranges

如果你调用了std::string::substring(), 它是否返回一个新构造的string? 答案是不一定的. 如果你想对它做独立的修改, 那么答案为是. 可是如果你想对原有的字符串做修改呢? 有些编程语言( 例如, Swift, Rust) 将字串实现为使用flyweight模式的一个range以节省内存占用的同时支持对原有串的操作.

在C++中的等价物是string_view. 另外还有一些array的变体, 它们都能够避免数据拷贝. 我们会尝试构造一个自己的string range.

假定在类中存储一些文本, 我们可以从中提取一部分文本并将其转换为大写. 当然, 我们可以直接把文本中的每个字符都改成大写. 但是假设我们还希望保留原有的文本, 只是在使用流输出操作符时大写化呢?

11.4 Naive Approach

一种很简单的做法是, 使用一个bool的数组来记录每个字符是否要将对应的字符改为大写.

class FormattedText
{
    string plainText;
    bool*  caps;
public:
    explicit FormattedText(const string& plainText)
        : plainText{plainText}
    {
        caps = new bool[plainText.length()];
    }

    ~FormattedText(){
        delete []caps;
    }
};

现在就可以使用它了:

void capitalize(int start, int end)
{
    for(int i=start; i<=end; ++i)
    {
        caps[i] = true;
    }
}

然后定义stream <<操作符:

friend std::ostream& operator<<(std::ostream& os, const FormattedText& obj)
{
    string s;
    for(int i=0; i<obj.plainText.length(); ++i)
    {
        char c = obj.plainText[i];
        s += (obj.caps[i] ? toupper(c): c);
    }
    return os << s;
}

上面的东西是可以用的:

1
2
3

FormattedText ft("This is a brave new world");
ft.capitalize(10,15);
cout << ft << endl;

当然, 这个实现很蠢. 它为每个字符都定义了一个bool的flag. 而实际上, 我们只需要start和end标志就足够了. 下面使用FlyWeight模式来重新实现它:

11.5 Flyweight Implementation

class BetterFormattedText
{
public:
    struct TextRange{
        int start, end;
        bool capitalize;
        bool covers(int position) const{
            return position >=start && position <=end;
        }
    };
private:
    string plain_text;
    vector<TextRange> formatting;
};

TextRange只是存储了区域的起始位置和实际的格式化信息. 它只有一个成员函数covers(), 用于判断给定位置的字符是否需要做特殊的格式化处理.

BetterFormattedText在一个vector中存储TextRange.

TextRange& get_range(int start, int end)
{
    formatting.emplace_back(TextRange{start, end});
    return *formatting.rbegin();
}

这个函数做了三件事:

创建了一个新的TextRange对象
将它移动到vector中
返回它的引用

在这个实现中, 我们还没有检查重复的和冲突的区段–它还可能能够进一步节省内存空间.

接下来实现<<操作符:

friend std::ostream& operator<<(std::ostream& os, 
                    const BetterFormattedText& obj)
{
    string s;
    for(size_t i=0; i<obj.plain_text.length(); i++)
    {
        auto c = obj.plain_text[i];
        for(const auto& rng: obj.formatting)
        {
            if( rng.covers(i) && rng.capitalize)
            {
                c = toupper(c);
            }
            s += c;
        }
    }
    return os << s; 
}

使用代码没有变化:

1
2
3

BetterFormattedText bft("This is a brave new world");
bft.get_range(10,15).capitalize = true;
cout << bft << endl;