What does the brk() system call do?

Question

According to Linux programmers manual:

brk() and sbrk() change the location of the program break, which defines the end of the process's data segment.

What does the data segment mean over here? Is it just the data segment or data, BSS, and heap combined?

According to wiki Data segment:

Sometimes the data, BSS, and heap areas are collectively referred to as the "data segment".

I see no reason for changing the size of just the data segment. If it is data, BSS and heap collectively then it makes sense as heap will get more space.

Which brings me to my second question. In all the articles I read so far, author says that heap grows upward and stack grows downward. But what they do not explain is what happens when heap occupies all the space between heap and stack?

enter image description here

So what do you do when you're out of space ? you swap to HDD. When you have used space, you release it for other kind of information. — Igoris Azanovas, Aug 08 '11 at 21:01
@Igoris: You're confusing *physical memory* (which you can swap to disk as needed, using virtual memory) and *address space*. When you fill up your address space, no amount of swapping will give you back those addresses in the middle. — Daniel Pryden, Aug 08 '11 at 21:21
Just as a reminder, the `brk()` system call is more useful in assembly language than in C. In C, `malloc()` should be used instead of `brk()` for any data-allocation purposes -- but this does not invalidate the proposed question in any way. — alecov, Aug 09 '11 at 00:58
People need to stop talking about *the* stack, please. We live in a multithreaded world! — Ben Voigt, Aug 09 '11 at 01:25
@Ben Don't child thread stacks get allocated on the heap? Is that just pthreads? — Rag, Aug 23 '11 at 19:13
@Brian: The heap is a complex data structure for handling regions of varying sizes and alignments, free pool, etc. Thread stacks are always contiguous (in virtual address space) sequences of complete pages. In most OSes, there's a page allocator underlying stacks, heap, and memory-mapped files. — Ben Voigt, Aug 23 '11 at 19:49
@Ben But there's only one capital-S "Stack" which is manipulated by brk() and sbrk(), right? The other thread stacks are lesser things handled by user code, correct? — Rag, Aug 23 '11 at 20:54
@Brian: Who said there's any "Stack" being manipulated by `brk()` and `sbrk()`? The stacks are managed by the page allocator, at a much lower level. — Ben Voigt, Aug 23 '11 at 20:58
@Ben Oh, when I came back to this question I was thinking that there's a stack being manipulated by those system calls, but now I recall it's actually the top of the heap. I was thinking that there must be a canonical stack, but what you're saying makes sense now. It doesn't matter what's going on in the stack addresses (as long as you have a stack pointer *somewhere* or other) because you're not supposed to share them between threads anyway. — Rag, Aug 23 '11 at 21:04

score 297 · Accepted Answer · edited Aug 04 '19 at 10:48

297

In the diagram you posted, the "break"—the address manipulated by brk and sbrk—is the dotted line at the top of the heap.

simplified image of virtual memory layout

The documentation you've read describes this as the end of the "data segment" because in traditional (pre-shared-libraries, pre-mmap) Unix the data segment was continuous with the heap; before program start, the kernel would load the "text" and "data" blocks into RAM starting at address zero (actually a little above address zero, so that the NULL pointer genuinely didn't point to anything) and set the break address to the end of the data segment. The first call to malloc would then use sbrk to move the break up and create the heap in between the top of the data segment and the new, higher break address, as shown in the diagram, and subsequent use of malloc would use it to make the heap bigger as necessary.

Meantime, the stack starts at the top of memory and grows down. The stack doesn't need explicit system calls to make it bigger; either it starts off with as much RAM allocated to it as it can ever have (this was the traditional approach) or there is a region of reserved addresses below the stack, to which the kernel automatically allocates RAM when it notices an attempt to write there (this is the modern approach). Either way, there may or may not be a "guard" region at the bottom of the address space that can be used for stack. If this region exists (all modern systems do this) it is permanently unmapped; if either the stack or the heap tries to grow into it, you get a segmentation fault. Traditionally, though, the kernel made no attempt to enforce a boundary; the stack could grow into the heap, or the heap could grow into the stack, and either way they would scribble over each other's data and the program would crash. If you were very lucky it would crash immediately.

I'm not sure where the number 512GB in this diagram comes from. It implies a 64-bit virtual address space, which is inconsistent with the very simple memory map you have there. A real 64-bit address space looks more like this:

less simplified address space

              Legend:  t: text, d: data, b: BSS

This is not remotely to scale, and it shouldn't be interpreted as exactly how any given OS does stuff (after I drew it I discovered that Linux actually puts the executable much closer to address zero than I thought it did, and the shared libraries at surprisingly high addresses). The black regions of this diagram are unmapped -- any access causes an immediate segfault -- and they are gigantic relative to the gray areas. The light-gray regions are the program and its shared libraries (there can be dozens of shared libraries); each has an independent text and data segment (and "bss" segment, which also contains global data but is initialized to all-bits-zero rather than taking up space in the executable or library on disk). The heap is no longer necessarily continous with the executable's data segment -- I drew it that way, but it looks like Linux, at least, doesn't do that. The stack is no longer pegged to the top of the virtual address space, and the distance between the heap and the stack is so enormous that you don't have to worry about crossing it.

The break is still the upper limit of the heap. However, what I didn't show is that there could be dozens of independent allocations of memory off there in the black somewhere, made with mmap instead of brk. (The OS will try to keep these far away from the brk area so they don't collide.)

edited Aug 04 '19 at 10:48

Sandra Rossi

11,934
5
22
48

answered Aug 09 '11 at 01:19

zwol

135,547
38
252
361

10

+1 for a detailed explanation. Do you know if `malloc` still relies on `brk` or if it is using `mmap` to be able to "give back" separate memory blocks? – Anders Abel Aug 09 '11 at 06:08
23

It depends on the specific implementation, but IIUC a lot of current `malloc`s use the `brk` area for small allocations and individual `mmap`s for large (say, >128K) allocations. See, for instance, the discussion of MMAP_THRESHOLD in the Linux `malloc(3)` manpage. – zwol Aug 09 '11 at 15:26
3

Indeed a good explanation. But as you said that Stack no longer sits at the top of the Virtual address space. Is this true for only 64 bit address space or it is true even for 32 bit address space. And if stack sits at the top of the address space, where does anonymous memory maps happen? Is it at the top of the virtual address space just before stack. – nik Aug 09 '11 at 15:28
3

@Nikhil: it's complicated. *Most* 32-bit systems put the stack at the very top of the *user-mode* address space, which is often only the lower 2 or 3G of the full address space (the remaining space is reserved for the kernel). I can't presently think of one that didn't but I don't know 'em all. Most 64-bit CPUs don't actually let you use the entire 64-bit space; the high 10 to 16 bits of the address have to be all-zero or all-one. The stack is generally placed near the top of the usable low addresses. I can't give you a rule for `mmap`; it's extremely OS-dependent. – zwol Aug 09 '11 at 16:47
1

Actually, 512GB implies a 39-bit user address space -- possibly half of a 40-bit address space with the upper half reserved for the kernel. – Chris Dodd Aug 03 '13 at 04:23
Why are the blank areas so big? Isn't that like wasting half of your memory? – rrrrrrrrrrrrrrrr Jul 27 '16 at 09:07
5

@RiccardoBestetti It wastes _address space_, but that's harmless -- a 64-bit virtual address space is so big that if you burned through a gigabyte of it every _second_ it would still take you 500 years to run out. [\[1\]](https://people.cs.umass.edu/~mcorner/courses/691J/papers/VM/chase_sharing/chase_sharing.pdf) Most processors don't even allow use of more than 2^48 to 2^53 bits of virtual address (the only exception I know of is POWER4 in hashed page table mode). It doesn't waste physical RAM; the unused addresses are not assigned to RAM. – zwol Jul 27 '16 at 14:09
@zwol Ah, I see! They are mapped to physical memory without any space in between each other. Makes sense now. Thanks. – rrrrrrrrrrrrrrrr Jul 27 '16 at 16:30
1

An excellent explanation. So if the heap is not necessarily contiguous with the program's data segment, how do you find the lowest byte of the heap (let's limit this to the case that you have no C runtime - just some assembler code) - is it that you just call sbrk for the first time to get some storage and it tells you in the return value? – codeshot Aug 06 '17 at 11:09
@codeshot Yes, that's the essence of it. The tricky bit would be, if you have no C runtime, you probably don't get `sbrk`, only `brk`, and the raw system call `brk` might not have the same semantics as the documented `brk` function in the C library, so you have to check what the kernel actually does. – zwol Aug 06 '17 at 13:52
@zwol, yeah I wrote some x64 linux assembler. The linux brk syscall is used for sbrk, on that platform you have to synchronise while you calculate the new brk size from the old using the difference. The syscall takes a desired size and returns one of the end pointers. Initially you can request brk(0) and you get the pointer that is simultaneously both ends. I'm not sure what happens in that case when the brk size is already nonzero – codeshot Aug 09 '17 at 20:22
great explanation, though this sentence: " or there is a region of reserved addresses below the stack, to which the kernel automatically allocates RAM when it notices an attempt to write there (this is the modern approach)." - too many ambiguities with "RAM", "it" "there". Could you be more specific? – Sujay Phadke Nov 28 '17 at 07:26
2

One reason Linux and other modern OSes lay out the different memory segments in a “surprisingly” inefficient way is [address space layout randomization](https://en.wikipedia.org/wiki/Address_space_layout_randomization). This is a security measure. If code and data are at unpredictable addresses, it becomes much harder for an attacker to escalate a buffer overrun to make a program execute arbitrary code. – Davislor Nov 05 '18 at 14:18
1

@Davislor Yeah, I know, but I was expecting the loader to try to pack the libraries into the first 2GB of address space anyway, because of the [small code model](https://eli.thegreenplace.net/2012/01/03/understanding-the-x64-code-models). Turns out that the small code model doesn't stop shared libraries and position-independent executables from being loaded at arbitrary base addresses; it only means they have to be no more than 2GB in total size. – zwol Nov 05 '18 at 14:28
1

@zwol Right, you did, but I thought newbies might like to know. – Davislor Nov 05 '18 at 14:30
Why should we explicitly expand a heap, whereas a stack rise is a kernel concern? Can't the kernel detect attempts to use an address in the heap higher than a current break one and automatically allocate RAM for it? – Ilya Loskutov Jun 15 '21 at 16:17
1

@Mergasov Allocation of memory automatically from page faults is problematic because there's no good way to report an allocation failure. We put up with it for the stack because the alternative would be to make a system call on _every function call_. For the heap, though, it's straightforward for `malloc` to call `sbrk` and/or `mmap` when (and only when) necessary, so that's what we do. – zwol Jun 15 '21 at 19:08

Ciro Santilli OurBigBook.com · Answer 2 · 2019-10-06T16:19:16.413

Minimal runnable example

What does brk( ) system call do?

Asks the kernel to let you you read and write to a contiguous chunk of memory called the heap.

If you don't ask, it might segfault you.

Without brk:

#define _GNU_SOURCE
#include <unistd.h>

int main(void) {
    /* Get the first address beyond the end of the heap. */
    void *b = sbrk(0);
    int *p = (int *)b;
    /* May segfault because it is outside of the heap. */
    *p = 1;
    return 0;
}

With brk:

#define _GNU_SOURCE
#include <assert.h>
#include <unistd.h>

int main(void) {
    void *b = sbrk(0);
    int *p = (int *)b;

    /* Move it 2 ints forward */
    brk(p + 2);

    /* Use the ints. */
    *p = 1;
    *(p + 1) = 2;
    assert(*p == 1);
    assert(*(p + 1) == 2);

    /* Deallocate back. */
    brk(b);

    return 0;
}

GitHub upstream.

The above might not hit a new page and not segfault even without the brk, so here is a more aggressive version that allocates 16MiB and is very likely to segfault without the brk:

#define _GNU_SOURCE
#include <assert.h>
#include <unistd.h>

int main(void) {
    void *b;
    char *p, *end;

    b = sbrk(0);
    p = (char *)b;
    end = p + 0x1000000;
    brk(end);
    while (p < end) {
        *(p++) = 1;
    }
    brk(b);
    return 0;
}

Tested on Ubuntu 18.04.

Virtual address space visualization

Before brk:

+------+ <-- Heap Start == Heap End

After brk(p + 2):

+------+ <-- Heap Start + 2 * sizof(int) == Heap End 
|      |
| You can now write your ints
| in this memory area.
|      |
+------+ <-- Heap Start

After brk(b):

+------+ <-- Heap Start == Heap End

To better understand address spaces, you should make yourself familiar with paging: How does x86 paging work?.

Why do we need both brk and sbrk?

brk could of course be implemented with sbrk + offset calculations, both exist just for convenience.

In the backend, the Linux kernel v5.0 has a single system call brk that is used to implement both: https://github.com/torvalds/linux/blob/v5.0/arch/x86/entry/syscalls/syscall_64.tbl#L23

12  common  brk         __x64_sys_brk

Is brk POSIX?

brk used to be POSIX, but it was removed in POSIX 2001, thus the need for _GNU_SOURCE to access the glibc wrapper.

The removal is likely due to the introduction mmap, which is a superset that allows multiple range to be allocated and more allocation options.

I think there is no valid case where you should to use brk instead of malloc or mmap nowadays.

brk vs malloc

brk is one old possibility of implementing malloc.

mmap is the newer stricly more powerful mechanism which likely all POSIX systems currently use to implement malloc. Here is a minimal runnable mmap memory allocation example.

Can I mix brk and malloc?

If your malloc is implemented with brk, I have no idea how that can possibly not blow up things, since brk only manages a single range of memory.

I could not however find anything about it on the glibc docs, e.g.:

https://www.gnu.org/software/libc/manual/html_mono/libc.html#Resizing-the-Data-Segment

Things will likely just work there I suppose since mmap is likely used for malloc.

See also:

More info

Internally, the kernel decides if the process can have that much memory, and earmarks memory pages for that usage.

This explains how the stack compares to the heap: What is the function of the push / pop instructions used on registers in x86 assembly?

Since `p` is a pointer to type `int`, shouldn't this have been `brk(p + 2);` ? — Johan Boulé, May 30 '16 at 12:14
Small note: The expression in the for-loop of the aggressive version should probably be `*(p + i) = 1;` — lima.sierra, Nov 14 '18 at 14:06
By the way, why do we need to use a `brk(p + 2)` instead of simply increase it by `sbrk(2)`? Is brk really necessary? — Yi Lin Liu, Mar 21 '19 at 17:00
@YiLinLiu I think it is just two very similar C frontends for a single kernel backend (`brk` syscall). `brk` is be slightly more convenient to restore previously allocated stack. — Ciro Santilli OurBigBook.com, Mar 21 '19 at 17:05
Warning: Make sure to use `printf()` at least once before the calling `sbrk(0)`. Cause the first `printf()` call on the program allocates a buffer for `stdout` and mess with `sbrk(0)`. [Reference](https://stackoverflow.com/questions/54294724/why-does-calling-sbrk0-twice-give-a-different-value). — andromeda947, Apr 14 '19 at 18:26
@SaketSharad because `p` is `int *`. What would you expect to be the correct code or comment? — Ciro Santilli OurBigBook.com, May 16 '19 at 12:00
@CiroSantilli新疆改造中心996ICU六四事件Considering size of int to be 4 bytes and size of a int * as 4 bytes (on a 32 bit machine), I was wondering shouldn't it get incremented by just 4 bytes (instead of 8 - (2 * sizeof int)). Shouldn't it point to next available heap storage - which will be 4 bytes away (not 8). Correct me if I'm missing something here. — Saket Sharad, May 16 '19 at 12:11
@SaketSharad `brk` takes a pointer to the end of the region, not the size of the delta like `sbrk`. — Ciro Santilli OurBigBook.com, May 16 '19 at 12:31
I wouldn't use `assert` here -- it would probably be better to use `if` and `printf`. — S.S. Anne, Dec 17 '19 at 15:49

score 10 · Answer 3 · edited Apr 13 '17 at 12:38

10

You can use brk and sbrk yourself to avoid the "malloc overhead" everyone's always complaining about. But you can't easily use this method in conjuction with malloc so it's only appropriate when you don't have to free anything. Because you can't. Also, you should avoid any library calls which may use malloc internally. Ie. strlen is probably safe, but fopen probably isn't.

Call sbrk just like you would call malloc. It returns a pointer to the current break and increments the break by that amount.

void *myallocate(int n){
    return sbrk(n);
}

While you can't free individual allocations (because there's no malloc-overhead, remember), you can free the entire space by calling brk with the value returned by the first call to sbrk, thus rewinding the brk.

void *memorypool;
void initmemorypool(void){
    memorypool = sbrk(0);
}
void resetmemorypool(void){
    brk(memorypool);
}

You could even stack these regions, discarding the most recent region by rewinding the break to the region's start.

One more thing ...

sbrk is also useful in code golf because it's 2 characters shorter than malloc.

edited Apr 13 '17 at 12:38

Community

1
1

answered Aug 09 '11 at 06:10

luser droog

18,988
3
53
105

7

-1 because: `malloc`/`free` most certainly *can* (and do) give memory back to the OS. They might not always do it when you want them to, but that's a matter of the heuristics being imperfectly tuned for your use case. More importantly, *it is unsafe to call `sbrk` with a nonzero argument* in any program that might ever call `malloc` -- and almost all C library functions are allowed to call `malloc` internally. The only ones that definitely won't are the [async-signal-safe](http://pubs.opengroup.org/onlinepubs/009695399/functions/xsh_chap02_04.html) functions. – zwol Aug 03 '13 at 01:39
And by "it is unsafe", I mean "your program will crash." – zwol Aug 03 '13 at 01:39
I've edited to remove the *returning memory* boast, and mentioned the danger of library functions internally using `malloc`. – luser droog Aug 03 '13 at 02:14
1

If you want to do fancy memory allocation, either base it oon top of malloc, or on top of mmap. Don't touch brk and sbrk, they are relics from the past that do more harm than good (even the manpages tell you to stay clear of them!) – Eloff Jan 18 '14 at 23:03
Agreed. For real world use, they're a no-no. But I just wrote a program yesterday which uses them. [code-golf, of course.](http://codegolf.stackexchange.com/a/18694/2381) – luser droog Jan 18 '14 at 23:44
4

This is silly. If you want to avoid the malloc overhead for a lot of small allocations, do one big allocation (with malloc or mmap, *not* sbrk) and dole it out yourself. If you keep the nodes of your binary tree in an array, you can use 8b or 16b indices instead of 64b pointers. This works great when you don't have to delete any node until you're ready to delete *all* the nodes. (e.g. build a sorted dictionary on the fly.) Using `sbrk` for this is *only* useful for code-golf, because manually using `mmap(MAP_ANONYMOUS)` is better in every way except source-code size. – Peter Cordes Oct 01 '15 at 11:58

R.. GitHub STOP HELPING ICE · Answer 4 · 2011-08-09T00:47:26.493

5

There is a special designated anonymous private memory mapping (traditionally located just beyond the data/bss, but modern Linux will actually adjust the location with ASLR). In principle it's no better than any other mapping you could create with mmap, but Linux has some optimizations that make it possible to expand the end of this mapping (using the brk syscall) upwards with reduced locking cost relative to what mmap or mremap would incur. This makes it attractive for malloc implementations to use when implementing the main heap.

edited Aug 09 '11 at 00:47

answered Aug 08 '11 at 22:32

R.. GitHub STOP HELPING ICE

208,859
35
376
711

You meant *possible* to expand the end of this mapping upward, yes? – zwol Aug 09 '11 at 00:35
Yes, fixed. Sorry about that! – R.. GitHub STOP HELPING ICE Aug 09 '11 at 00:47

score 1 · Answer 5 · answered Mar 12 '13 at 16:44

1

malloc uses brk system call to allocate memory.

include

int main(void){

char *a = malloc(10); 
return 0;
}

run this simple program with strace, it will call brk system.

answered Mar 12 '13 at 16:44

skanzariya

397
1
4
10

score 1 · Answer 6 · answered Aug 08 '11 at 21:00

1

The heap is placed last in the program's data segment. brk() is used to change (expand) the size of the heap. When the heap cannot grow any more any malloc call will fail.

answered Aug 08 '11 at 21:00

Anders Abel

67,989
17
150
217

So you are saying that all of the diagrams in the internet, like the one in my question are wrong. If possible can you please point me to a correct diagram. – nik Aug 08 '11 at 21:20
2

@Nikkhil Keep in mind that the top of that diagram is the *end* of the memory. The top of the stack moves *downward* on the diagram as the stack grows. The top of the heap moves *upward* on the diagram as it is expanded. – Rag Aug 08 '11 at 21:28

score 0 · Answer 7 · answered Aug 08 '11 at 20:59

0

I can answer your second question. Malloc will fail and return a null pointer. That's why you always check for a null pointer when dynamically allocating memory.

answered Aug 08 '11 at 20:59

Rag

6,405
2
31
38

then what's the use of brk and sbrk? – nik Aug 08 '11 at 21:17
3

@NikhilRathod: `malloc()` will use `brk()` and/or `sbrk()` under the hood -- and you can, too, if you want to implement your own customized version of `malloc()`. – Daniel Pryden Aug 08 '11 at 21:19
@Daniel Pryden: how can brk and sbrk work on heap when it is between stack and data segment as shown in the diagram above. for this to work heap should be in the end. Am I right? – nik Aug 08 '11 at 21:27
@Nikkhil There's a little stretchy arrow on your diagram on the border between Heap and empty space. It can expand into that space. There's a **lot** of space left "empty" in between the heap and the stack. The operating system gives you the illusion of a gigantic memory space transparently so you don't have to worry about that. – Rag Aug 08 '11 at 21:33
@Nikhil: Your program has both a **data segment** and a **stack segment**. The OS manages the stack segment -- usually you have one stack for each running thread, and the size is fixed when the stack is allocated, although I vaguely recall that perhaps Linux can resize stacks. The program manages the data segment by (directly or indirectly) calling `brk()`/`sbrk()` to change where the data segment ends. All the static data (program code, etc.) are of a fixed size, so the only part that gets bigger when you resize the data segment is the heap space. – Daniel Pryden Aug 08 '11 at 21:39
@Daniel I'm not sure that the OS manages the stack. You can easily move the stack pointer around using push/pop instructions, which aren't syscalls. When you want separate threads to have separate stacks, the stacks all have to be dynamically allocated in giant chunks on the heap. I remember it being a problem since if you don't know at first exactly how many threads you need then it's hard to divide up the heap equally amongst thread stacks. – Rag Aug 08 '11 at 22:06
2

@Brian: Daniel said that the OS manages the stack *segment*, not the stack *pointer* ... very different things. The point is that there is no sbrk/brk syscall for the stack segment -- Linux automatically allocates pages upon attempts to write to the end of the stack segment. – Jim Balter Aug 08 '11 at 23:45
1

And Brian, you only answered half of half of the question. The other half is what happens if you attempt to push onto the stack when no space is available ... you get a segmentation fault. – Jim Balter Aug 08 '11 at 23:49

score 0 · Answer 8 · answered Aug 08 '11 at 21:02

0

The data segment is the portion of memory that holds all your static data, read in from the executable at launch and usually zero-filled.

answered Aug 08 '11 at 21:02

monchalve

35
7

It also holds uninitialized static data (not present in the executable) which may be garbage. – luser droog Aug 09 '11 at 06:16
Uninitialized static data (`.bss`) is initialized to all-bits-zero by the OS before program start; this is actually guaranteed by the C standard. Some embedded systems might not bother, I suppose (I've never seen one, but I don't work all that embedded) – zwol Aug 09 '11 at 16:50
@zwol: Linux has a compile-time option to not zero pages returned by `mmap`, but I'd assume the `.bss` would still be zeroed. BSS space is probably the most compact way to express the fact that a program wants some zerod arrays. – Peter Cordes Oct 01 '15 at 12:06
1

@PeterCordes What the C standard says is that global variables declared with no initializer are treated as-if initialized to zero. A C implementation that puts such variables in `.bss` and doesn't zero `.bss` would therefore be nonconforming. But nothing forces a C implementation to use `.bss` at all or even have such a thing. – zwol Oct 01 '15 at 17:12
@PeterCordes Also, the line between the "C implementation" and the program can be very fuzzy, e.g. there's usually a small chunk of code from the implementation, statically linked into each executable, that runs before `main`; that code could zero out the `.bss` area rather than having the kernel do it, and that would still be conforming. – zwol Oct 01 '15 at 17:16
@zwol: My point is that such an implementation would be worse than having the kernel's ELF-loader zero-initialize the bss. You gain nothing by leaving that out, since you need to work around it to still have a usable C implementation (unless your embedded system doesn't need any zero-initialized arrays). Note that I said "usable", not "conforming": we're talking about an embedded system. If the combo of kernel, compiler, and runtime stuff works for the code you want to run, it's usable. – Peter Cordes Oct 01 '15 at 18:02

What does the brk() system call do?

8 Answers8

Linked

Related