Skip to main content
knackforge blog knowledge base

How to bulk delete drupal nodes with drush script

I have been working on an interesting project that integrates Drupal, Apache Solr and mahout for providing content recommendation engine. Needless to say it has been an interesting ride filled with a lots of sapid and amusing challenges. More details about this project will be shared in follow up post. In this post, quickly I would like to share a trick to delete a bunch of Drupal nodes using drush script.

[[{"type":"media","view_mode":"media_original","fid":"90","attributes":{"alt":"","class":"media-image","typeof":"foaf:Image"}}]][[{"type":"media","view_mode":"media_original","fid":"91","attributes":{"alt":"","class":"media-image","typeof":"foaf:Image"}}]][[{"type":"media","view_mode":"media_original","fid":"92","attributes":{"alt":"","class":"media-image","typeof":"foaf:Image"}}]]

My aim here is to preserve about 10 nodes per category and drop the rest. We use taxonomy module to categorize articles by subject like physics, chemistry, Space, Astronomy, etc.

 

Initially I attempted to use VBO as I remember reading it in the support mailing list. The site we were working had about 10K nodes, attempts to delete a few thousands together threw memory exhausted fatal error. And soon we realized that drush script is the way to go forward.

 

We transformed the Views query to a drush script and tried to delete every node in result set with a node_delete() call. This worked flawless but the performance was not appreciable in other words not considerable as it took about 2 to 3 seconds per node delete. To delete 10K node I couldn't imagine the time it would take.

 

Thanks to the new API node_delete_multiple() which adds the power to delete multiple nodes in a single go with the least SQL query which in turn saves a considerable amount of time. This API does the heavy lifting tasks of cleaning the node table, references in term data table, search index, etc. 

 

Below is the script that we finally ended up with to clean up the unwanted nodes. Within a 30 minutes time we were able to get a slim site ready to do more experimentation with mahout classification and clustering. To know more details about this stuff please follow Selvam's blog, lately he had posted about Integrating Solr and Mahout classifier, as assured we will share more about it in upcoming posts.

 

  1. <?php
  2. set_time_limit(0);
  3. for($tid 1; $tid <9; $tid++) {
  4.   $query = db_query("SELECT node.nid AS nid
  5.  FROM
  6.  {node} node
  7.  LEFT JOIN {field_data_field_news_category} field_data_field_news_category ON node.nid = field_data_field_news_category.entity_id AND (field_data_field_news_category.entity_type = 'node' AND field_data_field_news_category.deleted = '0')
  8.  INNER JOIN {taxonomy_term_data} taxonomy_term_data_field_data_field_news_category ON field_data_field_news_category.field_news_category_tid = taxonomy_term_data_field_data_field_news_category.tid
  9.  WHERE (( (node.status = '1') AND (node.type IN  ('press_release')) AND (field_data_field_news_category.field_news_category_tid = $tid) ))
  10.  ORDER BY node_created DESC
  11.  LIMIT 999999 OFFSET 10");
  12.   echo "Running tid $tid \n";
  13.   $nids = array();
  14.   $start = microtime(TRUE);
  15.   while ($result = $query->fetch()) {
  16.     $nids[] = $result->nid;
  17.   }
  18.   if (empty($nids)) {
  19.     continue;
  20.   }
  21.   $count = count($nids);
  22.   echo "Queuing node " . implode(', ', $nids) . "\n";
  23.   echo "Calling node_delete_multiple() to delete $count nodes \n";
  24.   node_delete_multiple($nids);
  25.   $taken = microtime(TRUE) - $start;
  26.   echo "Deleted $count nodes in $taken \n";

 

At the core, the above provided code uses a custom query that gives us the node id of superfluous nodes to be deleted. The code can run from any location within Drupal file system. In my case I had it saved in sites/all/modules/custom/kf_scripts/ folder.

 

To run the script cd to the directory where you have the script and issue drush scr filename. Once the code starts running the node gets deleted one by one, which saves you time in manual deletion in colossal node cases. I hope this helps. It is recommended that you alter the query to make this work in your scenario.