149 lines
4.3 KiB
Markdown
149 lines
4.3 KiB
Markdown
# LMO - Lua Machine Objects
|
|
|
|
See [online wiki](https://github.com/openwrt/luci/wiki/LMO) for latest version.
|
|
|
|
LMO is a simple binary format to pack language strings into a more efficient form.
|
|
Although it's suitable to store any kind of key-value table, it's only used for the LuCI \*.po based translation system at the moment.
|
|
The abbreviation "LMO" stands for "Lua Machine Objects" in the style of the GNU gettext \*.mo format.
|
|
|
|
## Format Specification
|
|
|
|
A LMO file is divided into two parts: the payload and the index lookup table.
|
|
All segments of the file are 4 Byte aligned to ease reading and processing of the format.
|
|
Only unsigned 32bit integers are used and stored in network byte order, so an implementation has to use htonl() to properly read them.
|
|
|
|
Schema:
|
|
|
|
<file:
|
|
<payload:
|
|
<entry #1: 4 byte aligned data>
|
|
|
|
<entry #2: 4 byte aligned data>
|
|
|
|
...
|
|
|
|
<entry #N: 4 byte aligned data>
|
|
>
|
|
|
|
<index table:
|
|
<entry #1:
|
|
<uint32_t: hash of the first key>
|
|
<uint32_t: hash of the first value>
|
|
<uint32_t: file offset of the first value>
|
|
<uint32_t: length of the first value>
|
|
>
|
|
|
|
<entry #2:
|
|
<uint32_t: hash of the second key>
|
|
<uint32_t: hash of the second value>
|
|
<uint32_t: file offset of the second value>
|
|
<uint32_t: length of the second value>
|
|
>
|
|
|
|
...
|
|
|
|
<entry #N:
|
|
<uint32_t: hash of the Nth key>
|
|
<uint32_t: hash of the Nth value>
|
|
<uint32_t: file offset of the Nth value>
|
|
<uint32_t: length of the Nth value>
|
|
>
|
|
>
|
|
|
|
<uint32_t: offset of the begin of index table>
|
|
>
|
|
|
|
|
|
|
|
## Processing
|
|
|
|
In order to process a LMO file, an implementation would have to do the following steps:
|
|
|
|
### Read Index
|
|
|
|
1. Locate and open the archive file
|
|
2. Seek to end of file - 4 bytes (sizeof(uint32_t))
|
|
3. Read 32bit index offset and swap from network to native byte order
|
|
4. Seek to index offset, calculate index length: filesize - index offset - 4
|
|
5. Initialize a linked list for index table entries
|
|
6. Read each index entry until the index length is reached, read and byteswap 4 * uint32_t for each step
|
|
7. Seek to begin of file
|
|
|
|
### Read Entry
|
|
|
|
1. Calculate the unsigned 32bit hash of the entries key value (see "Hash Function" section below)
|
|
2. Obtain the archive index
|
|
3. Iterate through the linked index list, perform the following steps for each entry:
|
|
1. Compare the entry hash value with the calculated hash from step 1
|
|
2. If the hash values are equal proceed with step 4
|
|
3. Select the next entry and repeat from step 3.1
|
|
4. Seek to the file offset specified in the selected entry
|
|
5. Read as much bytes as specified in the entry length into a buffer
|
|
6. Return the buffer value
|
|
|
|
## Hash Function
|
|
|
|
The current LuCI-LMO implementation uses the "Super Fast Hash" function which was kindly put in the public domain by its original author. See http://www.azillionmonkeys.com/qed/hash.html for details. Below is the C-Implementation of this function:
|
|
|
|
```c
|
|
#if (defined(__GNUC__) && defined(__i386__))
|
|
#define sfh_get16(d) (*((const uint16_t *) (d)))
|
|
#else
|
|
#define sfh_get16(d) ((((uint32_t)(((const uint8_t *)(d))[1])) << 8)\
|
|
+(uint32_t)(((const uint8_t *)(d))[0]) )
|
|
#endif
|
|
|
|
uint32_t sfh_hash(const char * data, int len)
|
|
{
|
|
uint32_t hash = len, tmp;
|
|
int rem;
|
|
|
|
if (len <= 0 || data == NULL) return 0;
|
|
|
|
rem = len & 3;
|
|
len >>= 2;
|
|
|
|
/* Main loop */
|
|
for (;len > 0; len--) {
|
|
hash += sfh_get16(data);
|
|
tmp = (sfh_get16(data+2) << 11) ^ hash;
|
|
hash = (hash << 16) ^ tmp;
|
|
data += 2*sizeof(uint16_t);
|
|
hash += hash >> 11;
|
|
}
|
|
|
|
/* Handle end cases */
|
|
switch (rem) {
|
|
case 3: hash += sfh_get16(data);
|
|
hash ^= hash << 16;
|
|
hash ^= data[sizeof(uint16_t)] << 18;
|
|
hash += hash >> 11;
|
|
break;
|
|
case 2: hash += sfh_get16(data);
|
|
hash ^= hash << 11;
|
|
hash += hash >> 17;
|
|
break;
|
|
case 1: hash += *data;
|
|
hash ^= hash << 10;
|
|
hash += hash >> 1;
|
|
}
|
|
|
|
/* Force "avalanching" of final 127 bits */
|
|
hash ^= hash << 3;
|
|
hash += hash >> 5;
|
|
hash ^= hash << 4;
|
|
hash += hash >> 17;
|
|
hash ^= hash << 25;
|
|
hash += hash >> 6;
|
|
|
|
return hash;
|
|
}
|
|
```
|
|
|
|
## Reference Implementation
|
|
|
|
A reference implementation can be found here:
|
|
https://github.com/openwrt/luci/blob/master/modules/luci-base/src/template_lmo.c
|
|
|
|
The `po2lmo.c` executable implements a `*.po` to `*.lmo` conversation utility.
|
|
Lua bindings for lmo are defined in `template_lualib.c` and associated headers.
|