Post

Precise sleep on Windows

Precise sleep on Windows

Well… it has been almost a year that this blog post remained as a draft… So I decided to publish it as-is even though it is not as thorough as I initially wanted it to be…

Recently I read an interesting blog post on Windows and high resolution timers by Clément Grégoire and it reminded me of an old blog post I read a while ago on Making an accurate Sleep() function by Blat Blatnik.
I re-read it and realized there was a more recent version The perfect Sleep() function.

I got a bit nerd-sniped and decided to try to combine both approaches.

Before going further, I encourage you to read the aforementioned blog posts as I won’t go into as much detail.

Precise sleep using high resolution timer

Concept

The gist of the technique is to sleep only as long it is “safe” to sleep to not overshoot the scheduler period.

So for sleep duration longer than the scheduler period, the sleep is split into multiple sleeps shorter than the scheduler period.

There are then two approaches: just query the actual scheduler period and use that for the splitting, or enforce the scheduler period via a call to timeBeginPeriod or the undocumented NtSetTimerResolution.

Enforcing the scheduler period changes it for all processes so it should be done with care as it can increase power consumption.

Querying the timer capabilities

In the provided code, I query the minimum period supported (i.e the highest timing precision) and set the scheduler period to it. But you can set any supported period in the range returned by timeGetDevCaps or the undocumented NtQueryTimerResolution.

timeGetDevCaps

In timeapi.h

1
2
3
4
5
/* timer device capabilities data structure */
typedef struct timecaps_tag {
    UINT    wPeriodMin;     /* minimum period supported  */
    UINT    wPeriodMax;     /* maximum period supported  */
} TIMECAPS, *PTIMECAPS, NEAR *NPTIMECAPS, FAR *LPTIMECAPS;
1
2
TIMECAPS caps;
timeGetDevCaps(&caps, sizeof caps);

NtQueryTimerResolution

Function NtQueryTimerResolution returns resolution of system Timer in context of calling process.

  • MinimumResolution Means highest possible delay (in 100-ns units) between timer events.
  • MaximumResolution Means lowest possible delay (in 100-ns units) between timer events.
  • CurrentResolution Current timer resolution, in 100-ns units.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
#ifdef __cplusplus
extern "C" {
#endif
// Let the linker import the function
NTSYSAPI NTSTATUS NTAPI NtQueryTimerResolution(PULONG MinimumResolution, PULONG MaximumResolution, PULONG CurrentResolution);
#ifdef __cplusplus
}
#endif
#pragma comment(lib, "ntdll.lib")

// https://ntdoc.m417z.com/ntsettimerresolution
ULONG min_timer_resolution_100ns = 0;
ULONG max_timer_resolution_100ns = 0;
ULONG curr_timer_resolution_100ns = 0;
if(!NtQueryTimerResolution(&min_timer_resolution_100ns, &max_timer_resolution_100ns, &curr_timer_resolution_100ns) == 0)
{
  fprintf(stderr, "NtQueryTimerResolution failed.\n");
}

Setting the timer period

timeBeginPeriod

The “safe and documented” way to set the timer period is via timeBeginPeriod that expects a period in milliseconds: so effectively we cannot go under 1ms.
This means that for sleep times lower than 1ms, we will have to spin to ensure a high precision.

In the Remark section of timeBeginPeriod it is mentioned that we have to call timeEndPeriod with the same period as timeBeginPeriod when we are done.
“Call this function immediately before using timer services, and call the timeEndPeriod function immediately after you are finished using the timer services. You must match each call to timeBeginPeriod with a call to timeEndPeriod, specifying the same minimum resolution in both calls.”

NtSetTimerResolution

On my system, NtQueryTimerResolution returns a range of [5000, 156250]x100ns i.e. [0.5, 15.625]ms.
This means that we can achieve twice the precision compared to timeBeginPeriod(1) by setting a period of 0.5ms via NtSetTimerResolution(5000, TRUE, ...).
This will allow us to reduce spinning for sleep durations below 1ms (and above 0.5ms).

However I don’t know what consumes more power between “more spin looping” and “less spin looping with a lower scheduler period”…

Also in the same logic as timeBeginPeriod/timeEndPeriod, I restore the previous scheduler period in the uninitialization of the precise sleep function.

Implementation

The implementation I ended up with is similar to Blat’s albeit a bit terser and optimized.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
#include <Windows.h>
#pragma comment(lib, "Winmm.lib") // timeGetDevCaps, timeBeginPeriod

#ifndef K_ENABLE_PRECISE_SLEEP_LOGGING
#define K_ENABLE_PRECISE_SLEEP_LOGGING 0 // Enable/Disable logging
#endif

#if K_ENABLE_PRECISE_SLEEP_LOGGING
#include <stdio.h>
#define K_PRECISE_SLEEP_LOG(...) printf(__VA_ARGS__)
#define K_PRECISE_SLEEP_ERR(...) fprintf(stderr, __VA_ARGS__)
#else
#define K_PRECISE_SLEEP_LOG(...) (void)sizeof(0, __VA_ARGS__)
#define K_PRECISE_SLEEP_ERR(...) (void)sizeof(0, __VA_ARGS__)
#endif

// Performance-counter frequency in "counts/second"
static INT64 g_performance_counter_frequency;

// Scheduler period in ms
static double g_scheduler_period_ms;

// Windows high resolution timer
static HANDLE g_high_resolution_timer;

// Max sleep time we perform to keep high precision (in 100ns)
static INT64 g_high_resolution_timer_max_sleep_time_100ns;

// Conversion factor between performance counter and '100ns'
static double g_pc_to_100ns;

// Tolerance to avoid overshooting due to timer setup overhead (in "counts/second")
static INT64 g_high_resolution_timer_tolerance_pc;

// Tolerance to avoid overshooting due to timer setup overhead (in 100ns)
static INT64 g_high_resolution_timer_tolerance_100ns;

// Tells whether we successfully set the scheduler period via NtSetTimerResolution
static BOOLEAN g_is_using_nt_timer_resolution = FALSE;

// Previous timer resolution returned by NtQueryTimerResolution
static ULONG g_prev_timer_resolution_100ns = 0;

// High precision sleep function using Windows high resolution timer
void PreciseSleep_HighResolutionTimer(double seconds)
{
	LARGE_INTEGER pc;
	QueryPerformanceCounter(&pc);
	const INT64 target_pc = (INT64)(pc.QuadPart + seconds * g_performance_counter_frequency);

	INT64 remaining_sleep_time_pc = target_pc - pc.QuadPart;
	while(remaining_sleep_time_pc > g_high_resolution_timer_tolerance_pc)
	{
		INT64 remaining_sleep_time_100ns = (INT64)(remaining_sleep_time_pc * g_pc_to_100ns - g_high_resolution_timer_tolerance_100ns);

		// Split the sleep time in intervals of time representing a fraction of the scheduler period to avoid oversleep
		if(remaining_sleep_time_100ns > g_high_resolution_timer_max_sleep_time_100ns)
			remaining_sleep_time_100ns = g_high_resolution_timer_max_sleep_time_100ns;

		// SetWaitableTimerEx expected a due_time_100ns time multiple of 100ns:
		//  - positive values indicate absolute time.
		//  - negative values indicate relative time.
		// https://learn.microsoft.com/en-us/windows/win32/api/synchapi/nf-synchapi-setwaitabletimer
		LARGE_INTEGER due_time_100ns;
		due_time_100ns.QuadPart = -remaining_sleep_time_100ns;
		SetWaitableTimerEx(g_high_resolution_timer, &due_time_100ns, 0, NULL, NULL, NULL, 0);
		WaitForSingleObject(g_high_resolution_timer, INFINITE);

		QueryPerformanceCounter(&pc);
		remaining_sleep_time_pc = target_pc - pc.QuadPart;
	}

	while(pc.QuadPart < target_pc) // Spin for any remaining time
	{
		YieldProcessor();
		QueryPerformanceCounter(&pc);
	}
}

// High precision sleep function using the system Sleep
void PreciseSleep_SystemSleep(double seconds)
{
	LARGE_INTEGER pc;
	QueryPerformanceCounter(&pc);
	const INT64 target_pc = (INT64)(pc.QuadPart + seconds * g_performance_counter_frequency);

	const double k_sleep_tolerance_s = 0.000'02;
	const double sleep_duration_ms = (seconds - k_sleep_tolerance_s) * 1000.0 - g_scheduler_period_ms; // Sleep for 1 scheduler period less than requested.
	const int num_sleep_slices = (int)(sleep_duration_ms / g_scheduler_period_ms);
	if(num_sleep_slices > 0)
		Sleep((DWORD)(num_sleep_slices * g_scheduler_period_ms));

	QueryPerformanceCounter(&pc);
	while(pc.QuadPart < target_pc) // Spin for any remaining time
	{
		YieldProcessor();
		QueryPerformanceCounter(&pc);
	}
}

#ifdef __cplusplus
extern "C" {
#endif
// Let the linker import the functions
NTSYSAPI NTSTATUS NTAPI NtSetTimerResolution(ULONG DesiredResolution, BOOLEAN SetResolution, PULONG CurrentResolution);
NTSYSAPI NTSTATUS NTAPI NtQueryTimerResolution(PULONG MinimumResolution, PULONG MaximumResolution, PULONG CurrentResolution);
#ifdef __cplusplus
}
#endif
#pragma comment(lib, "ntdll.lib")

// Precise sleep function
void(*PreciseSleep)(double seconds) = NULL;

void InitPreciseSleepFunction(int use_nt_set_timer_resolution)
{
	g_is_using_nt_timer_resolution = FALSE;

	// First try to set a potentially more precise scheduler period via (the undocumented) NtSetTimerResolution
	// http://undocumented.ntinternals.net/UserMode/Undocumented%20Functions/Time/NtQueryTimerResolution.html
	// http://undocumented.ntinternals.net/UserMode/Undocumented%20Functions/Time/NtSetTimerResolution.html
	if(use_nt_set_timer_resolution)
	{
		ULONG min_timer_resolution_100ns = 0;
		ULONG max_timer_resolution_100ns = 0;
		ULONG curr_timer_resolution_100ns = 0;
		if(NtQueryTimerResolution(&min_timer_resolution_100ns, &max_timer_resolution_100ns, &curr_timer_resolution_100ns) == 0)
		{
			g_prev_timer_resolution_100ns = curr_timer_resolution_100ns;
			K_PRECISE_SLEEP_LOG("NtQueryTimerResolution: current resolution = %lu | range = [%lu, %lu]\n", curr_timer_resolution_100ns, max_timer_resolution_100ns, min_timer_resolution_100ns);
			ULONG actual_timer_resolution_100ns;
			if(NtSetTimerResolution(max_timer_resolution_100ns, TRUE, &actual_timer_resolution_100ns) == 0)
			{
				const double curr_timer_resolution_ms = actual_timer_resolution_100ns / 10'000.0;
				g_scheduler_period_ms = curr_timer_resolution_ms;
				g_is_using_nt_timer_resolution = TRUE;
			}
			else
			{
				K_PRECISE_SLEEP_ERR("NtSetTimerResolution failed.\n");
			}
		}
		else
		{
			K_PRECISE_SLEEP_ERR("NtQueryTimerResolution failed.\n");
		}
	}

	if(!g_is_using_nt_timer_resolution)
	{
		// https://learn.microsoft.com/en-us/windows/win32/api/timeapi/nf-timeapi-timegetdevcaps
		TIMECAPS caps;
		MMRESULT r = timeGetDevCaps(&caps, sizeof caps);
		if(r != TIMERR_NOERROR)
			K_PRECISE_SLEEP_ERR("timeGetDevCaps failed with error %u\n", r);

		g_scheduler_period_ms = caps.wPeriodMin;
		
        K_PRECISE_SLEEP_LOG("InitPreciseSleepFunction: timeBeginPeriod(%f)\n", g_scheduler_period_ms);
		r = timeBeginPeriod((UINT)g_scheduler_period_ms);
		if(r != TIMERR_NOERROR)
			K_PRECISE_SLEEP_ERR("timeBeginPeriod(%f) failed with error %u\n", g_scheduler_period_ms, r);
	}

	K_PRECISE_SLEEP_LOG("Scheduler period = %fms\n", g_scheduler_period_ms);

	LARGE_INTEGER qpf;
	QueryPerformanceFrequency(&qpf);
	g_performance_counter_frequency = qpf.QuadPart;

	// 'Performance counter' -> '100ns' conversion factor
	g_pc_to_100ns = 10'000'000.0 / g_performance_counter_frequency;

	g_high_resolution_timer = CreateWaitableTimerExW(NULL, NULL, CREATE_WAITABLE_TIMER_HIGH_RESOLUTION, TIMER_ALL_ACCESS);

	const double high_resolution_timer_tolerance_s = (g_scheduler_period_ms + 0.02) / 1000.0;
	g_high_resolution_timer_tolerance_100ns = (INT64)(high_resolution_timer_tolerance_s * 10'000'000);
	g_high_resolution_timer_tolerance_pc = (INT64)(high_resolution_timer_tolerance_s * g_performance_counter_frequency);

	// Split the sleep time in intervals of time representing 95% of the scheduler period
	// > "High resolution timer has a quirk that if you request a sleep period longer than
	//    the system timer period, the precision of the timer plummets."
	//    (https://blog.bearcats.nl/perfect-sleep-function/)
	g_high_resolution_timer_max_sleep_time_100ns = (INT64)g_scheduler_period_ms * 9'500; // 0.95 * ms_to_100ns = 0.95 * 10'000

	if(g_high_resolution_timer)
		PreciseSleep = PreciseSleep_HighResolutionTimer;
	else
		PreciseSleep = PreciseSleep_SystemSleep;
}

void DeinitPreciseSleepFunction()
{
	if(g_is_using_nt_timer_resolution)
	{
		ULONG actual_timer_resolution_100ns;
		if(NtSetTimerResolution(g_prev_timer_resolution_100ns, TRUE, &actual_timer_resolution_100ns) != 0)
		{
			K_PRECISE_SLEEP_ERR("NtSetTimerResolution failed.\n");
		}
	}
	else
	{
		// https://learn.microsoft.com/en-us/windows/win32/api/timeapi/nf-timeapi-timebeginperiod (in 'Remark' section)
		//   Call this function immediately before using timer services, and call the timeEndPeriod function immediately after you are finished using the timer services.
		//   You must match each call to timeBeginPeriod with a call to timeEndPeriod, specifying the same minimum resolution in both calls.
        K_PRECISE_SLEEP_LOG("DeinitPreciseSleepFunction: timeEndPeriod(%f)\n", g_scheduler_period_ms);
		const MMRESULT r = timeEndPeriod((UINT)g_scheduler_period_ms);
		if(r != TIMERR_NOERROR)
			K_PRECISE_SLEEP_ERR("timeEndPeriod(%f) failed with error %u\n", g_scheduler_period_ms, r);
	}
}

In your code you have to include "precise_sleep.h", initialize the sleep function and use PreciseSleep

1
2
3
4
5
6
7
8
9
10
11
12
13
#include "precise_sleep.h"

int main()
{
    const int use_nt_set_timer_resolution = 1;
    InitPreciseSleepFunction(use_nt_set_timer_resolution);
    
    // Your app code that uses PreciseSleep(sleep_duration_s);
    
    DeinitPreciseSleepFunction();
  
    return 0;
}

Results

I also implemented a simple test app with some profiling using Tracy that you can find here.

1.1ms

timeBeginPeriod(1.0)

Command: run --sleep_duration_ms 1.1

Sleep duration 1.1ms - timeBeginPeriod(1.0)

We are spinning ~41% of the time and sleep ~48% of the time.

Sleep precision - duration 1.1ms - timeBeginPeriod(1.0)

NtSetTimerResolution(0.5)

Command: run --use_nt_set_timer_resolution --sleep_duration_ms 1.1

Sleep duration 1.1ms - NtSetTimerResolution(0.5)

We are spinning ~19% of the time and sleep ~76% of the time.
So as expected, with the more precise timer resolution we can reduce significantly the time we spend spinning while still maintaining a great accuracy.

Sleep precision - duration 1.1ms - NtSetTimerResolution(0.5)

1.0ms

timeBeginPeriod(1.0)

Command: run --sleep_duration_ms 1.0

Sleep duration 1.0ms - timeBeginPeriod(1.0)

We are spinning ~79% of the time. We never sleep because the scheduler period is equal to the sleep duration and we take some extra margin to avoid oversleeping.

Sleep precision - duration 1.0ms - timeBeginPeriod(1.0)

NtSetTimerResolution(0.5)

Command: run --use_nt_set_timer_resolution --sleep_duration_ms 1.0

Sleep duration 1.0ms - NtSetTimerResolution(0.5)

We are spinning ~18% of the time and sleep ~78% of the time.
Here we are allowed to sleep since the scheduler period is way less than the sleep duration.

Sleep precision - duration 1.0ms - NtSetTimerResolution(0.5)


As with the 1.1ms experiment, we spend a lot less time spinning while maintaining a good sleep duration accuracy.

0.6ms

timeBeginPeriod(1.0)

Command: run --sleep_duration_ms 0.6

Sleep duration 0.6ms - timeBeginPeriod(1.0)

We are spinning ~79% of the time and we never sleep because the scheduler period is greater than the sleep duration.

Sleep precision - duration 0.6ms - timeBeginPeriod(1.0)

NtSetTimerResolution(0.5)

Command: run --use_nt_set_timer_resolution --sleep_duration_ms 0.6

Sleep duration 0.6ms - NtSetTimerResolution(0.5)

We are spinning ~35% of the time and sleep ~56% of the time.
Here we are allowed to sleep since the scheduler period is smaller than the sleep duration.

Sleep precision - duration 0.6ms - NtSetTimerResolution(0.5)


As with the 1.1ms and 1.0ms experiments, we spend a lot less time spinning while maintaining a good sleep duration accuracy.

References

This post is licensed under CC BY 4.0 by the author.

Trending Tags