Frequently I am building bots. They are running on the server and gathering some data. For some of them, I use Puppeteer and headless Chromium instances. That combination works great. However, there is an issue. Puppeteer doesn't remove files from a temp directory. After a while, it could be a problem, because directory size is growing consistently.
Did you meet the same issue? Let me help you fix it.
Good to know
This article is suitable only for people who run Puppeteer on Linux (maybe on Mac?). If you are using Windows, unfortunately, I can't help. However, perhaps you could use a general idea and make the same for your case.
Long story short
I solved that issue by adding a crontab task, which deletes all files with a modification time older than 10 days.
If you want to know more details read the following text.
Search Puppeteer temp files
Let's go into /tmp
directory and check where are Puppeteer files.
# show files from /tmp filter by "puppeteer"
ls /tmp | grep puppeteer
Here we go. Puppeteer creates many directories. Let's calculate how many directories we have and the total size of them.
# count of directories
ls /tmp | grep puppeteer | wc -l
# total size of them
du -hc /tmp/puppeteer* | grep total
I got 25782 directories and 27 gigabytes of size. It's huge... Let's move on and automate removing them.
Help Puppeteer remove temp files
If the mountain doesn't go to me, I will go to the mountain ๐. Let's write a crontab task to remove Puppeteer files that weren't modified for a long time.
First, let's write a command to find files older than 10 days (for example).
# find dirs and files that weren't modified 10+ days
find /tmp/puppeteer* -mtime +10
The command will find files that weren't modified 10 or more days ago. To delete them, we could extend the command like the following one.
# find and delete dirs and files that weren't modified 10+ days
find /tmp/puppeteer* -mtime +10 -exec rm -rf {} \;
Time to add it as a crontab task. Write the following command to open the crontab config.
crontab -e
Here is my example. The crontab will run that command each day at 4:05 and delete all files related to Puppeteer. I used the full path to find the command to be sure crontab would use the correct one.
Conclusion
That simple way will help Puppeteer to remove files from a temp directory. If you have any questions, ask them below in the comments section.
If you are interested, I have one more article related to headless browsers. Check it out: How to protect content against scrapers with headless browsers?