knackforge
February 14, 2012
I have been working on an interesting project that integrates Drupal, Apache Solr, and mahout for providing a content recommendation engine. Needless to say, it has been an interesting ride filled with lots of sapid and amusing challenges. More details about this project will be shared in a follow-up post. In this post, quickly I would like to share a trick to delete a bunch of Drupal nodes using the Drush script.
[[{"type":"media","view_mode":"media_original","fid":"90","attributes":{"alt":"","class":"media-image","typeof":"foaf:Image"}}]][[{"type":"media","view_mode":"media_original","fid":"91","attributes":{"alt":"","class":"media-image","typeof":"foaf:Image"}}]][[{"type":"media","view_mode":"media_original","fid":"92","attributes":{"alt":"","class":"media-image","typeof":"foaf:Image"}}]]
My aim here is to preserve about 10 nodes per category and drop the rest. We use the taxonomy module to categorize articles by subject like physics, chemistry, Space, Astronomy, etc.
Initially, I attempted to use VBO as I remember reading it on the support mailing list. The site we were working on had about 10K nodes, attempts to delete a few thousand together threw memory exhausted fatal error. And soon we realized that drush script is the way to go forward.
We transformed the Views query to a drush script and tried to delete every node in the result set with a node_delete() call. This worked flawlessly but the performance was not appreciable in other words not considerable as it took about 2 to 3 seconds per node delete. To delete 10 K nodes I couldn't imagine the time it would take.
Thanks to the new API node_delete_multiple() which adds the power to delete multiple nodes in a single go with the least SQL query which in turn saves a considerable amount of time. This API does the heavy lifting tasks of cleaning the node table, references in a term data table, search index, etc.
Below is the script that we finally ended up with to clean up the unwanted nodes. Within 30 minutes time, we were able to get a slim site ready to do more experimentation with mahout classification and clustering. To know more details about this stuff please follow Selvam's blog, lately, he had posted about Integrating Solr and Mahout classifier, as assured we will share more about it in upcoming posts.
At the core, the above-provided code uses a custom query that gives us the node id of superfluous nodes to be deleted. The code can run from any location within the Drupal file system. In my case I had it saved in sites/all/modules/custom/kf_scripts/ folder.
To run the script cd to the directory where you have the script and issue drush scr filename. Once the code starts running the node gets deleted one by one, which saves you time in manual deletion in colossal node cases. I hope this helps. It is recommended that you alter the query to make this work in your scenario.
Just like how your fellow techies do.
We'd love to talk about how we can work together
Take control of your AWS cloud costs that enables you to grow!