Debian/Ubuntu Random Disk Expansion

Running mysql 8.0.34 on debian 12 (BookWorm).

I get this random issue where the disk looks like it is being steadily consumed, but when I step into the disk or search the disk there doesn’t seem to be anything that is eating space, it then resets back to normal disk after it reaches full. The entire process takes about 45 minutes.

Fresh install of debian 12, the only service installed on the OS is mysql.

The last two images going from 100% disk consumption back to 44% happens in about 1 second.

Any explanation as to why this would occur or suggestions on debugging much appreciated.

enter image description here

Running lsof +L1

COMMAND PID USER FD TYPE DEVICE SIZE/OFF NLINK NODE NAME

mysqld 886068 mysql 521u REG 254,1 194824503296 0 29360306 /tmp/#29360306 (deleted)

Maybe I should ask the dolphin?

Asked By: Watson

||

Running lsof +L1

COMMAND      PID       USER   FD   TYPE DEVICE     SIZE/OFF NLINK     NODE NAME    
mysqld    886068      mysql  521u   REG  254,1 194824503296     0 29360306 /tmp/#29360306 (deleted)

That’s a temporary file that’s been deleted while it was (and is) held open by mysqld, and it appears to have been about 194G in size when you ran the lsof +L1 command.

If you want to find out what the file contains, you may be able to access it as /proc/<PID number>/fd/<FD number> while the process still exists and holds it open. So in this case, you could try e.g.:

sudo file -L /proc/886068/fd/521

to try and find out what kind of file it actually is.

Could you stop and restart your MySQL database? Once mysqld stops, it would "let go" of that file, and then the filesystem would automatically clean it up. Or if you can find a database session that may have done a silly query that produces near-infinite output, you might be able to just force that session to end, which might cause mysqld to close any temporary files related to that session.

If your MySQL server is reachable from internet, there might be a possibility that some intruder or malware is abusing your database to "hide" some data there.

From the MySQL 8.0.x documentation on temporary files:

MySQL arranges that temporary files are removed if mysqld is terminated. On platforms that support it (such as Unix), this is done by unlinking the file after opening it. The disadvantage of this is that the name does not appear in directory listings and you do not see a big temporary file that fills up the file system in which the temporary file directory is located. (In such cases, lsof +L1 may be helpful in identifying large files associated with mysqld.)

When sorting (ORDER BY or GROUP BY), MySQL normally uses one or two temporary files. The maximum disk space required is determined by the following expression:

(length of what is sorted + sizeof(row pointer)) * number of matched rows * 2

The row pointer size is usually four bytes, but may grow in the future for really big tables.

For some statements, MySQL creates temporary SQL tables that are not hidden and have names that begin with #sql.

Some SELECT queries creates temporary SQL tables to hold intermediate results.

So, it appears that your system is frequently running some queries that have a rather large result set, and either the intermediate results or sorting requires the creation of large temporary files.

If it turns out that a particular query is frequently repeated, and the intermediate file is created by the requirement to sort the results, then you should really consider adding an index that would allow the database to access the appropriate table(s) effectively pre-sorted in the way they’re needed in the frequently-repeated query. That might also give you a significant query performance boost.

Of course, if you examine the actual query and find that an application is creating a silly query that could be implemented in a smarter way, then a bug report (with a suggestion on an improved query, if you can make one) might benefit everyone that’s using the application in question.

(I’m not a real DBA, but years of discussions with DBAs at the water cooler at work have taught me that the lack of appropriate indexing and application-generated queries that search for something in a stupid way seem to be the two most common reasons for poor database performance. They might not matter in development, but once the amount of data grows to production-scale, the inefficiencies become apparent.)

Answered By: telcoM
Categories: Answers Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.