1*dcadb2baSIngo Weinhold /*- 2*dcadb2baSIngo Weinhold * Copyright (c) 1998 Matthew Dillon. All Rights Reserved. 3*dcadb2baSIngo Weinhold * Redistribution and use in source and binary forms, with or without 4*dcadb2baSIngo Weinhold * modification, are permitted provided that the following conditions 5*dcadb2baSIngo Weinhold * are met: 6*dcadb2baSIngo Weinhold * 1. Redistributions of source code must retain the above copyright 7*dcadb2baSIngo Weinhold * notice, this list of conditions and the following disclaimer. 8*dcadb2baSIngo Weinhold * 2. Redistributions in binary form must reproduce the above copyright 9*dcadb2baSIngo Weinhold * notice, this list of conditions and the following disclaimer in the 10*dcadb2baSIngo Weinhold * documentation and/or other materials provided with the distribution. 11*dcadb2baSIngo Weinhold * 4. Neither the name of the University nor the names of its contributors 12*dcadb2baSIngo Weinhold * may be used to endorse or promote products derived from this software 13*dcadb2baSIngo Weinhold * without specific prior written permission. 14*dcadb2baSIngo Weinhold * 15*dcadb2baSIngo Weinhold * THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS 16*dcadb2baSIngo Weinhold * OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED 17*dcadb2baSIngo Weinhold * WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE 18*dcadb2baSIngo Weinhold * ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY 19*dcadb2baSIngo Weinhold * DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL 20*dcadb2baSIngo Weinhold * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE 21*dcadb2baSIngo Weinhold * GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS 22*dcadb2baSIngo Weinhold * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, 23*dcadb2baSIngo Weinhold * WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING 24*dcadb2baSIngo Weinhold * NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS 25*dcadb2baSIngo Weinhold * SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 26*dcadb2baSIngo Weinhold */ 27*dcadb2baSIngo Weinhold 28*dcadb2baSIngo Weinhold /* 29*dcadb2baSIngo Weinhold * BLIST.C - Bitmap allocator/deallocator, using a radix tree with hinting 30*dcadb2baSIngo Weinhold * 31*dcadb2baSIngo Weinhold * This module implements a general bitmap allocator/deallocator. The 32*dcadb2baSIngo Weinhold * allocator eats around 2 bits per 'block'. The module does not 33*dcadb2baSIngo Weinhold * try to interpret the meaning of a 'block' other then to return 34*dcadb2baSIngo Weinhold * SWAPBLK_NONE on an allocation failure. 35*dcadb2baSIngo Weinhold * 36*dcadb2baSIngo Weinhold * A radix tree is used to maintain the bitmap. Two radix constants are 37*dcadb2baSIngo Weinhold * involved: One for the bitmaps contained in the leaf nodes (typically 38*dcadb2baSIngo Weinhold * 32), and one for the meta nodes (typically 16). Both meta and leaf 39*dcadb2baSIngo Weinhold * nodes have a hint field. This field gives us a hint as to the largest 40*dcadb2baSIngo Weinhold * free contiguous range of blocks under the node. It may contain a 41*dcadb2baSIngo Weinhold * value that is too high, but will never contain a value that is too 42*dcadb2baSIngo Weinhold * low. When the radix tree is searched, allocation failures in subtrees 43*dcadb2baSIngo Weinhold * update the hint. 44*dcadb2baSIngo Weinhold * 45*dcadb2baSIngo Weinhold * The radix tree also implements two collapsed states for meta nodes: 46*dcadb2baSIngo Weinhold * the ALL-ALLOCATED state and the ALL-FREE state. If a meta node is 47*dcadb2baSIngo Weinhold * in either of these two states, all information contained underneath 48*dcadb2baSIngo Weinhold * the node is considered stale. These states are used to optimize 49*dcadb2baSIngo Weinhold * allocation and freeing operations. 50*dcadb2baSIngo Weinhold * 51*dcadb2baSIngo Weinhold * The hinting greatly increases code efficiency for allocations while 52*dcadb2baSIngo Weinhold * the general radix structure optimizes both allocations and frees. The 53*dcadb2baSIngo Weinhold * radix tree should be able to operate well no matter how much 54*dcadb2baSIngo Weinhold * fragmentation there is and no matter how large a bitmap is used. 55*dcadb2baSIngo Weinhold * 56*dcadb2baSIngo Weinhold * Unlike the rlist code, the blist code wires all necessary memory at 57*dcadb2baSIngo Weinhold * creation time. Neither allocations nor frees require interaction with 58*dcadb2baSIngo Weinhold * the memory subsystem. In contrast, the rlist code may allocate memory 59*dcadb2baSIngo Weinhold * on an rlist_free() call. The non-blocking features of the blist code 60*dcadb2baSIngo Weinhold * are used to great advantage in the swap code (vm/nswap_pager.c). The 61*dcadb2baSIngo Weinhold * rlist code uses a little less overall memory then the blist code (but 62*dcadb2baSIngo Weinhold * due to swap interleaving not all that much less), but the blist code 63*dcadb2baSIngo Weinhold * scales much, much better. 64*dcadb2baSIngo Weinhold * 65*dcadb2baSIngo Weinhold * LAYOUT: The radix tree is layed out recursively using a 66*dcadb2baSIngo Weinhold * linear array. Each meta node is immediately followed (layed out 67*dcadb2baSIngo Weinhold * sequentially in memory) by BLIST_META_RADIX lower level nodes. This 68*dcadb2baSIngo Weinhold * is a recursive structure but one that can be easily scanned through 69*dcadb2baSIngo Weinhold * a very simple 'skip' calculation. In order to support large radixes, 70*dcadb2baSIngo Weinhold * portions of the tree may reside outside our memory allocation. We 71*dcadb2baSIngo Weinhold * handle this with an early-termination optimization (when bighint is 72*dcadb2baSIngo Weinhold * set to -1) on the scan. The memory allocation is only large enough 73*dcadb2baSIngo Weinhold * to cover the number of blocks requested at creation time even if it 74*dcadb2baSIngo Weinhold * must be encompassed in larger root-node radix. 75*dcadb2baSIngo Weinhold * 76*dcadb2baSIngo Weinhold * NOTE: the allocator cannot currently allocate more then 77*dcadb2baSIngo Weinhold * BLIST_BMAP_RADIX blocks per call. It will panic with 'allocation too 78*dcadb2baSIngo Weinhold * large' if you try. This is an area that could use improvement. The 79*dcadb2baSIngo Weinhold * radix is large enough that this restriction does not effect the swap 80*dcadb2baSIngo Weinhold * system, though. Currently only the allocation code is effected by 81*dcadb2baSIngo Weinhold * this algorithmic unfeature. The freeing code can handle arbitrary 82*dcadb2baSIngo Weinhold * ranges. 83*dcadb2baSIngo Weinhold * 84*dcadb2baSIngo Weinhold * This code can be compiled stand-alone for debugging. 85*dcadb2baSIngo Weinhold */ 86*dcadb2baSIngo Weinhold 87*dcadb2baSIngo Weinhold /* 88*dcadb2baSIngo Weinhold * NOTE: a few modifications is made in the orginal FreeBSD code: 89*dcadb2baSIngo Weinhold * 1. change the name of some variables/constants 90*dcadb2baSIngo Weinhold * 2. discard the code handling ALL_FREE and ALL_ALLOCATED state 91*dcadb2baSIngo Weinhold * 3. allocate more than 32 slots won't lead to kernel panic 92*dcadb2baSIngo Weinhold * 4. remove all the code for debugging 93*dcadb2baSIngo Weinhold * 94*dcadb2baSIngo Weinhold * Zhao Shuai 95*dcadb2baSIngo Weinhold * upczhsh@163.com 96*dcadb2baSIngo Weinhold */ 97*dcadb2baSIngo Weinhold 98*dcadb2baSIngo Weinhold 99*dcadb2baSIngo Weinhold #include <stdlib.h> 100*dcadb2baSIngo Weinhold #include <util/RadixBitmap.h> 101*dcadb2baSIngo Weinhold 102*dcadb2baSIngo Weinhold #define TERMINATOR -1 103*dcadb2baSIngo Weinhold 104*dcadb2baSIngo Weinhold 105*dcadb2baSIngo Weinhold static uint32 106*dcadb2baSIngo Weinhold radix_bitmap_init(radix_node *node, uint32 radix, uint32 skip, uint32 slots) 107*dcadb2baSIngo Weinhold { 108*dcadb2baSIngo Weinhold uint32 index = 0; 109*dcadb2baSIngo Weinhold int32 big_hint = radix < slots ? radix : slots; 110*dcadb2baSIngo Weinhold 111*dcadb2baSIngo Weinhold // leaf node 112*dcadb2baSIngo Weinhold if (radix == BITMAP_RADIX) { 113*dcadb2baSIngo Weinhold if (node) { 114*dcadb2baSIngo Weinhold node->big_hint = big_hint; 115*dcadb2baSIngo Weinhold if (big_hint == BITMAP_RADIX) 116*dcadb2baSIngo Weinhold node->u.bitmap = 0; 117*dcadb2baSIngo Weinhold else 118*dcadb2baSIngo Weinhold node->u.bitmap = (bitmap_t)-1 << big_hint; 119*dcadb2baSIngo Weinhold } 120*dcadb2baSIngo Weinhold return index; 121*dcadb2baSIngo Weinhold } 122*dcadb2baSIngo Weinhold 123*dcadb2baSIngo Weinhold // not leaf node 124*dcadb2baSIngo Weinhold if (node) { 125*dcadb2baSIngo Weinhold node->big_hint = big_hint; 126*dcadb2baSIngo Weinhold node->u.available = big_hint; 127*dcadb2baSIngo Weinhold } 128*dcadb2baSIngo Weinhold 129*dcadb2baSIngo Weinhold radix /= NODE_RADIX; 130*dcadb2baSIngo Weinhold uint32 next_skip = skip / NODE_RADIX; 131*dcadb2baSIngo Weinhold 132*dcadb2baSIngo Weinhold uint32 i; 133*dcadb2baSIngo Weinhold for (i = 1; i <= skip; i += next_skip) { 134*dcadb2baSIngo Weinhold if (slots >= radix) { 135*dcadb2baSIngo Weinhold index = i + radix_bitmap_init(node ? &node[i] : NULL, 136*dcadb2baSIngo Weinhold radix, next_skip - 1, radix); 137*dcadb2baSIngo Weinhold slots -= radix; 138*dcadb2baSIngo Weinhold } else if (slots > 0) { 139*dcadb2baSIngo Weinhold index = i + radix_bitmap_init(node ? &node[i] : NULL, 140*dcadb2baSIngo Weinhold radix, next_skip - 1, slots); 141*dcadb2baSIngo Weinhold slots = 0; 142*dcadb2baSIngo Weinhold } else { // add a terminator 143*dcadb2baSIngo Weinhold if (node) 144*dcadb2baSIngo Weinhold node[i].big_hint = TERMINATOR; 145*dcadb2baSIngo Weinhold break; 146*dcadb2baSIngo Weinhold } 147*dcadb2baSIngo Weinhold } 148*dcadb2baSIngo Weinhold 149*dcadb2baSIngo Weinhold if (index < i) 150*dcadb2baSIngo Weinhold index = i; 151*dcadb2baSIngo Weinhold 152*dcadb2baSIngo Weinhold return index; 153*dcadb2baSIngo Weinhold } 154*dcadb2baSIngo Weinhold 155*dcadb2baSIngo Weinhold 156*dcadb2baSIngo Weinhold radix_bitmap * 157*dcadb2baSIngo Weinhold radix_bitmap_create(uint32 slots) 158*dcadb2baSIngo Weinhold { 159*dcadb2baSIngo Weinhold uint32 radix = BITMAP_RADIX; 160*dcadb2baSIngo Weinhold uint32 skip = 0; 161*dcadb2baSIngo Weinhold 162*dcadb2baSIngo Weinhold while (radix < slots) { 163*dcadb2baSIngo Weinhold radix *= NODE_RADIX; 164*dcadb2baSIngo Weinhold skip = (skip + 1) * NODE_RADIX; 165*dcadb2baSIngo Weinhold } 166*dcadb2baSIngo Weinhold 167*dcadb2baSIngo Weinhold radix_bitmap *bmp = (radix_bitmap *)malloc(sizeof(radix_bitmap)); 168*dcadb2baSIngo Weinhold if (bmp == NULL) 169*dcadb2baSIngo Weinhold return NULL; 170*dcadb2baSIngo Weinhold 171*dcadb2baSIngo Weinhold bmp->radix = radix; 172*dcadb2baSIngo Weinhold bmp->skip = skip; 173*dcadb2baSIngo Weinhold bmp->free_slots = slots; 174*dcadb2baSIngo Weinhold bmp->root_size = 1 + radix_bitmap_init(NULL, radix, skip, slots); 175*dcadb2baSIngo Weinhold 176*dcadb2baSIngo Weinhold bmp->root = (radix_node *)malloc(bmp->root_size * sizeof(radix_node)); 177*dcadb2baSIngo Weinhold if (bmp->root == NULL) { 178*dcadb2baSIngo Weinhold free(bmp); 179*dcadb2baSIngo Weinhold return NULL; 180*dcadb2baSIngo Weinhold } 181*dcadb2baSIngo Weinhold 182*dcadb2baSIngo Weinhold radix_bitmap_init(bmp->root, radix, skip, slots); 183*dcadb2baSIngo Weinhold 184*dcadb2baSIngo Weinhold return bmp; 185*dcadb2baSIngo Weinhold } 186*dcadb2baSIngo Weinhold 187*dcadb2baSIngo Weinhold 188*dcadb2baSIngo Weinhold void 189*dcadb2baSIngo Weinhold radix_bitmap_destroy(radix_bitmap *bmp) 190*dcadb2baSIngo Weinhold { 191*dcadb2baSIngo Weinhold free(bmp->root); 192*dcadb2baSIngo Weinhold free(bmp); 193*dcadb2baSIngo Weinhold } 194*dcadb2baSIngo Weinhold 195*dcadb2baSIngo Weinhold 196*dcadb2baSIngo Weinhold static swap_addr_t 197*dcadb2baSIngo Weinhold radix_leaf_alloc(radix_node *leaf, swap_addr_t slotIndex, int32 count) 198*dcadb2baSIngo Weinhold { 199*dcadb2baSIngo Weinhold if (count <= BITMAP_RADIX) { 200*dcadb2baSIngo Weinhold bitmap_t bitmap = ~leaf->u.bitmap; 201*dcadb2baSIngo Weinhold uint32 n = BITMAP_RADIX - count; 202*dcadb2baSIngo Weinhold bitmap_t mask = (bitmap_t)-1 >> n; 203*dcadb2baSIngo Weinhold for (uint32 j = 0; j <= n; j++) { 204*dcadb2baSIngo Weinhold if ((bitmap & mask) == mask) { 205*dcadb2baSIngo Weinhold leaf->u.bitmap |= mask; 206*dcadb2baSIngo Weinhold return (slotIndex + j); 207*dcadb2baSIngo Weinhold } 208*dcadb2baSIngo Weinhold mask <<= 1; 209*dcadb2baSIngo Weinhold } 210*dcadb2baSIngo Weinhold } 211*dcadb2baSIngo Weinhold 212*dcadb2baSIngo Weinhold // we could not allocate count here, update big_hint 213*dcadb2baSIngo Weinhold if (leaf->big_hint >= count) 214*dcadb2baSIngo Weinhold leaf->big_hint = count - 1; 215*dcadb2baSIngo Weinhold return SWAP_SLOT_NONE; 216*dcadb2baSIngo Weinhold } 217*dcadb2baSIngo Weinhold 218*dcadb2baSIngo Weinhold 219*dcadb2baSIngo Weinhold static swap_addr_t 220*dcadb2baSIngo Weinhold radix_node_alloc(radix_node *node, swap_addr_t slotIndex, int32 count, 221*dcadb2baSIngo Weinhold uint32 radix, uint32 skip) 222*dcadb2baSIngo Weinhold { 223*dcadb2baSIngo Weinhold uint32 next_skip = skip / NODE_RADIX; 224*dcadb2baSIngo Weinhold radix /= NODE_RADIX; 225*dcadb2baSIngo Weinhold 226*dcadb2baSIngo Weinhold for (uint32 i = 1; i <= skip; i += next_skip) { 227*dcadb2baSIngo Weinhold if (node[i].big_hint == TERMINATOR) // TERMINATOR 228*dcadb2baSIngo Weinhold break; 229*dcadb2baSIngo Weinhold 230*dcadb2baSIngo Weinhold if (count <= node[i].big_hint) { 231*dcadb2baSIngo Weinhold swap_addr_t addr = SWAP_SLOT_NONE; 232*dcadb2baSIngo Weinhold if (next_skip == 1) 233*dcadb2baSIngo Weinhold addr = radix_leaf_alloc(&node[i], slotIndex, count); 234*dcadb2baSIngo Weinhold else 235*dcadb2baSIngo Weinhold addr = radix_node_alloc(&node[i], slotIndex, count, radix, 236*dcadb2baSIngo Weinhold next_skip - 1); 237*dcadb2baSIngo Weinhold if (addr != SWAP_SLOT_NONE) { 238*dcadb2baSIngo Weinhold node->u.available -= count; 239*dcadb2baSIngo Weinhold if (node->big_hint > node->u.available) 240*dcadb2baSIngo Weinhold node->big_hint = node->u.available; 241*dcadb2baSIngo Weinhold 242*dcadb2baSIngo Weinhold return addr; 243*dcadb2baSIngo Weinhold } 244*dcadb2baSIngo Weinhold } 245*dcadb2baSIngo Weinhold slotIndex += radix; 246*dcadb2baSIngo Weinhold } 247*dcadb2baSIngo Weinhold 248*dcadb2baSIngo Weinhold // we could not allocate count in the subtree, update big_hint 249*dcadb2baSIngo Weinhold if (node->big_hint >= count) 250*dcadb2baSIngo Weinhold node->big_hint = count - 1; 251*dcadb2baSIngo Weinhold return SWAP_SLOT_NONE; 252*dcadb2baSIngo Weinhold } 253*dcadb2baSIngo Weinhold 254*dcadb2baSIngo Weinhold 255*dcadb2baSIngo Weinhold swap_addr_t 256*dcadb2baSIngo Weinhold radix_bitmap_alloc(radix_bitmap *bmp, uint32 count) 257*dcadb2baSIngo Weinhold { 258*dcadb2baSIngo Weinhold swap_addr_t addr = SWAP_SLOT_NONE; 259*dcadb2baSIngo Weinhold 260*dcadb2baSIngo Weinhold if (bmp->radix == BITMAP_RADIX) 261*dcadb2baSIngo Weinhold addr = radix_leaf_alloc(bmp->root, 0, count); 262*dcadb2baSIngo Weinhold else 263*dcadb2baSIngo Weinhold addr = radix_node_alloc(bmp->root, 0, count, bmp->radix, bmp->skip); 264*dcadb2baSIngo Weinhold 265*dcadb2baSIngo Weinhold if (addr != SWAP_SLOT_NONE) 266*dcadb2baSIngo Weinhold bmp->free_slots -= count; 267*dcadb2baSIngo Weinhold 268*dcadb2baSIngo Weinhold return addr; 269*dcadb2baSIngo Weinhold } 270*dcadb2baSIngo Weinhold 271*dcadb2baSIngo Weinhold 272*dcadb2baSIngo Weinhold static void 273*dcadb2baSIngo Weinhold radix_leaf_dealloc(radix_node *leaf, swap_addr_t slotIndex, uint32 count) 274*dcadb2baSIngo Weinhold { 275*dcadb2baSIngo Weinhold uint32 n = slotIndex & (BITMAP_RADIX - 1); 276*dcadb2baSIngo Weinhold bitmap_t mask = ((bitmap_t)-1 >> (BITMAP_RADIX - count - n)) 277*dcadb2baSIngo Weinhold & ((bitmap_t)-1 << n); 278*dcadb2baSIngo Weinhold leaf->u.bitmap &= ~mask; 279*dcadb2baSIngo Weinhold 280*dcadb2baSIngo Weinhold leaf->big_hint = BITMAP_RADIX; 281*dcadb2baSIngo Weinhold } 282*dcadb2baSIngo Weinhold 283*dcadb2baSIngo Weinhold 284*dcadb2baSIngo Weinhold static void 285*dcadb2baSIngo Weinhold radix_node_dealloc(radix_node *node, swap_addr_t slotIndex, uint32 count, 286*dcadb2baSIngo Weinhold uint32 radix, uint32 skip, swap_addr_t index) 287*dcadb2baSIngo Weinhold { 288*dcadb2baSIngo Weinhold node->u.available += count; 289*dcadb2baSIngo Weinhold 290*dcadb2baSIngo Weinhold uint32 next_skip = skip / NODE_RADIX; 291*dcadb2baSIngo Weinhold radix /= NODE_RADIX; 292*dcadb2baSIngo Weinhold 293*dcadb2baSIngo Weinhold uint32 i = (slotIndex - index) / radix; 294*dcadb2baSIngo Weinhold index += i * radix; 295*dcadb2baSIngo Weinhold i = i * next_skip + 1; 296*dcadb2baSIngo Weinhold 297*dcadb2baSIngo Weinhold while (i <= skip && index < slotIndex + count) { 298*dcadb2baSIngo Weinhold uint32 v = index + radix - slotIndex; 299*dcadb2baSIngo Weinhold if (v > count) 300*dcadb2baSIngo Weinhold v = count; 301*dcadb2baSIngo Weinhold 302*dcadb2baSIngo Weinhold if (next_skip == 1) 303*dcadb2baSIngo Weinhold radix_leaf_dealloc(&node[i], slotIndex, v); 304*dcadb2baSIngo Weinhold else 305*dcadb2baSIngo Weinhold radix_node_dealloc(&node[i], slotIndex, v, radix, 306*dcadb2baSIngo Weinhold next_skip - 1, index); 307*dcadb2baSIngo Weinhold 308*dcadb2baSIngo Weinhold if (node->big_hint < node[i].big_hint) 309*dcadb2baSIngo Weinhold node->big_hint = node[i].big_hint; 310*dcadb2baSIngo Weinhold 311*dcadb2baSIngo Weinhold count -= v; 312*dcadb2baSIngo Weinhold slotIndex += v; 313*dcadb2baSIngo Weinhold index += radix; 314*dcadb2baSIngo Weinhold i += next_skip; 315*dcadb2baSIngo Weinhold } 316*dcadb2baSIngo Weinhold } 317*dcadb2baSIngo Weinhold 318*dcadb2baSIngo Weinhold 319*dcadb2baSIngo Weinhold void 320*dcadb2baSIngo Weinhold radix_bitmap_dealloc(radix_bitmap *bmp, swap_addr_t slotIndex, uint32 count) 321*dcadb2baSIngo Weinhold { 322*dcadb2baSIngo Weinhold if (bmp->radix == BITMAP_RADIX) 323*dcadb2baSIngo Weinhold radix_leaf_dealloc(bmp->root, slotIndex, count); 324*dcadb2baSIngo Weinhold else 325*dcadb2baSIngo Weinhold radix_node_dealloc(bmp->root, slotIndex, count, bmp->radix, 326*dcadb2baSIngo Weinhold bmp->skip, 0); 327*dcadb2baSIngo Weinhold 328*dcadb2baSIngo Weinhold bmp->free_slots += count; 329*dcadb2baSIngo Weinhold } 330*dcadb2baSIngo Weinhold 331