This article is about Data Science
Safe Clearing of Private Data
By NIIT Editorial
Published on 28/07/2021
6 minutes
Private data like secret keys, derivatives, and passwords often need to be stored in programs. After using such private data we need to erase their traces in memory so that a possible intruder can’t gain access to them. This article discusses why the memset() function does not clear the private data.
memset()
Memset() may be used at times to erase memory but there are many possible scenarios of incorrect use of memset(). With memset(), there may be issues with clearing both dynamically allocated buffers and stack-allocated buffers.
The stack
Let’s discuss an example that deals with using a stack-allocated variable.
Here a code fragment has been mentioned that handles or operates a password.
#include <string>
#include <functional>
#include <iostream>
//Private data
struct Private_data
{
size_t m_hash;
char m_pswd[100];
};
//Function performs some operations on password
void doSmth(Private_data& data)
{
std::string s(data.m_pswd);
std::hash<std::string> hash_fn;
data.m_hash = hash_fn(s);
}
//Function for password entering and processing
int funcPswd()
{
Private_data DATA;
std::cin >> DATA.m_pswd;
doSmth(DATA);
memset(&DATA, 0, sizeof(Private_data));
return 1;
}
int main()
{
funcPswd();
return 0;
}</std::string></iostream></functional></string>
This example is completely synthetic and very conventional.
If a debug version of the code is built and then run in the debugger, it is seen that it works rather well: both the password and its hash value calculated are deleted after use.
Now, let’s have a look at the code in the Visual Studio debugger:
....
doSm(Data);
00000013F3072BF lea rcx,[Data]
00000013F3072C3 call doSm (013F30153Ch)
memset(&data, 0, sizeof(PrivateData));
00000013F3072C8 mov r8d,70h
00000013F3072CE xor edx,edx
00000013F3072D0 lea rcx,[Data]
00000013F3072D4 call memset (013F301352h)
return 1;
00000013F3072D9 mov eax,1
....
We can see the call and use of memset() function, which helps in clearing the data after it has been used.
This can be stopped right here, but we are going to try and build an optimized release version. This is what the debugger looks like now.
....
00000013F7A1035 call
std::operator>><><char> > (013F7A18B0h)
00000013F7A103A lea rcx,[rsp+20h]
00000013F7A103F call doSmth (013F7A1170h)
return 0;
00000013F7A1044 xor eax,eax
.... </char>
Any instructions associated or connected with the call to memset function have now been deleted. The compiler now assumes that since the data is no longer being used there is no need to erase that data by calling a function. This in particular is a legal choice and not an error of the compiler. From a language and more technical viewpoint, since the buffer is not utilized further in the program, a memset() call is not necessary as removing this call will not affect its behavior. Therefore, the private data is not cleared and this is very bad.
The heap
Digging a little deeper, we get to see what happens when the new operator or the malloc function are used to allocate data in dynamic memory.
Let’s modify the code we used previously to work with malloc:
#include <string>
#include <functional>
#include <iostream>
struct PrivateData
{
size_t m_hash;
char m_pswd[100];
};
void doSmth(PrivateData& data)
{
std::string s(data.m_pswd);
std::hash<std::string> hash_fn;
data.m_hash = hash_fn(s);
}
int funcPswd()
{
PrivateData* data = (PrivateData*)malloc(sizeof(PrivateData));
std::cin >> data->m_pswd;
doSmth(*data);
memset(data, 0, sizeof(PrivateData));
free(data);
return 1;
}
int main()
{
funcPswd();
return 0;
}</std::string></iostream></functional></string>
Since the debug version already has all the calls where they should be, here, the testing will be of the release version. The following assembler code is what we get after using Visual Studio 2015 to compile it.
....
00000013FBB1021 mov rcx,
qword p [__imp_std::cin (013FBB30D8h)]
00000013FBB1028 mov rbx,rax
00000013FBB102B lea rdx,[rax+8]
00000013FBB102F call
std::operator>><><char> > (013FBB18B0h)
00000013FBB1034 mov rcx,rbx
00000013FBB1037 call doSm (013FBB1170h)
00000013FBB103C xor edx,edx
00000013FBB103E mov rcx,rbx
00000013FBB1041 lea r8d,[rdx+70h]
00000013FBB1045 call memset (013FBB2A2Eh)
00000013FBB104A mov rcx,rbx
00000013FBB104D call qword p [__imp_free (013FBB3170h)]
return 0;
00000013FBB1053 xor eax,eax
.... </char>
Visual Studio does well and erases the data as expected and planned. But we need to test other compilers such as gcc, version 5.2.1, and clang, version 3.7.0 as well.
We’ve now modified the code for clang and gcc and added additional code to help print the content of the memory block allocated both before and after the cleanup is done. The contents of the block pointer points should be printed after the memory is freed. This shouldn’t be done in real-life programs as there is no estimation as to how the application will respond. Although we are using this technique in this experiment.
....
#include "string.h"
....
size_t len = strlen(data->m_pswd);
for (int i = 0; i < len;="" ++i)="" printf("%c",="" data-="">m_pswd[i]);
printf("| %zu \n", data->m_hash);
memset(data, 0, sizeof(PrivateData));
free(data);
for (int i = 0; i < len;="" ++i)="" printf("%c",="" data-="">m_pswd[i]);
printf("| %zu \n", data->m_hash);
....
Here’s the gcc compiler’s fragment of the assembler code.
movq (%r12), %rsi
movl $.LC2, %edi
xorl %eax, %eax
call printf
movq %r12, %rdi
call free
The call to the memset() function is removed and the printing function (printf) is succeeded by a second call to free() function. If the code is run and an arbitrary password (example “MyTopSecret”) is entered, the screen will reflect the following message:
MyTopSecret| 7882334103340833743
MyTopSecret| 0
The hash had been modified and changed as a side effect of the work of the memory manager. The password “MyTopSecret” though remains intact in the memory.
Now, let’s see how this works with clang:
movq (%r14), %rsi
movl $.L.str.1, %edi
xorl %eax, %eax
callq printf
movq %r14, %rdi
callq free
The compiler in this case, just like in the previous case, removes the memset() function call and this is the printed output that is displayed:
MyTopSecret| 7882334103340833743
MyTopSecret| 0
So, here we see that both clang and gcc made a decision to optimize the code. And since, after calling the memset() function the memory is freed, these compilers delete this call categorizing it as irrelevant.
These experiments conclude that compilers delete the call to memset() function for the purpose of optimizing the dynamic memory as well as the stack of the application.
Now let us observe the response of compilers to using a new operator for allocating memory.
After modification in the code:
#include <string>
#include <functional>
#include <iostream>
#include "string.h"
struct PrivateData
{
size_t m_hash;
char m_pswd[100];
};
void doSmth(PrivateData& data)
{
std::string s(data.m_pswd);
std::hash<std::string> hash_fn;
data.m_hash = hash_fn(s);
}
int funcPswd()
{
PrivateData* data = new PrivateData();
std::cin >> data->m_pswd;
doSmth(*data);
memset(data, 0, sizeof(PrivateData));
delete data;
return 1;
}
int main()
{
funcPswd();
return 0;
}</std::string></iostream></functional></string>
The memory is erased as expected by Visual Studio:
000000013FEB1044 call doSmth (013FEB1180h)
000000013FEB1049 xor edx,edx
000000013FEB104B mov rcx,rbx
000000013FEB104E lea r8d,[rdx+70h]
000000013FEB1052 call memset (013FEB2A3Eh)
000000013FEB1057 mov edx,70h
000000013FEB105C mov rcx,rbx
000000013FEB105F call operator delete (013FEB1BA8h)
return 0;
000000013FEB1064 xor eax,eax
The gcc compiler also decides to leave the clearing function:
call printf
movq %r13, %rdi
movq %rbp, %rcx
xorl %eax, %eax
andq $-8, %rdi
movq $0, 0(%rbp)
movq $0, 104(%rbp)
subq %rdi, %rcx
addl $112, %ecx
shrl $3, %ecx
rep stosq
movq %rbp, %rdi
call _ZdlPv
Accordingly, the printed output changes; the data entered earlier are no longer present there:
MyTopSecret| 7882334103340833743
| 0
Clang on the other hand, didn’t only optimize the code but also removed the “unnecessary” function:
movq (%r14), %rsi
movl $.L.str.1, %edi
xorl %eax, %eax
callq printf
movq %r14, %rdi
callq _ZdlPv
Now let us print the content of the memory:
MyTopSecret| 7882334103340833743
MyTopSecret| 0
The password still remains there and can be easily stolen.
To sum it all, an optimizing compiler may eliminate the memset() call irrespective of whether stack or dynamic is used. Visual Studio though, when using the dynamic memory, didn’t remove the call to memset() function but it can’t always be expected to behave in this way in real-life coding. The harmful result of this will reveal itself while being used with other compilation switches. The research shows that memset() call cannot be relied on to clear private data.
So, is there a better way to clear private data?
Special memory-clearing functions that can’t be deleted or erased by the compiler after optimization must be used.
For example, in Visual Studio RtlSecureZeroMemory can be used. memset_s function is also there starting with C11. Additionally, if necessary, a safe function of our own can also be implemented. A lot of such guides and examples are there on the internet. Some of them are mentioned below.
Solution No. 1.
errno_t memset_s(void *vl, rsize_t s_max, int d, rsize_t n) {
if (vl == NULL) return EINVAL;
if (smax > RSIZE_MAX) return EINVAL;
if (n > s_max) return EINVAL;
volatile unsigned char *p = vl;
while (s_max-- && n--) {
*p++ = d;
}
return 0;
}
Solution No. 2.
void secure_zero(void *s, size_t n)
{
volatile char *p = s;
while (n--) *p++ = 0;
}
Many programmers even go further can create functions that can fill the arrangement with values that are pseudo-random and hinder time measuring attacks as they have different running times. Implementations of these functions can also be found on the internet.
Conclusion
PVS-Studio static analyzer can detect errors in data-clearing that are discussed and it makes use of diagnostic V597 to signal the problem. This article has provided an elaborate explanation of this diagnostic and why it is important. Many programmers still think that there is nothing to worry about and the analyzer “picks on” their code, which is unfortunate. This is because they can see the call to memset() function in the debugger and tend to forget that what they are seeing is a debug version. To learn more check out the courses offered by NIIT
Data Science Foundation Program (Full Time)
Become an industry ready StackRoute Certified Python Programmer in Data Science. The program is tailor-made for data enthusiasts. It enables learner to become job-ready to join data science practice team and gain experience to grow up as Data Analyst.
Visualise Data using Python and Excel
6 Weeks Full Time Immersive