Tuesday, September 15, 2015

How to Migrate large data eg 100k or more into drupal 7 smoothly ?

Data migration... huuuh..Always a tedious job and specially for large data.Traditional programmer follow many protocols to do correct data migration to drupal . Many things need to take care like number of batch record to be migrate , manual concentration of each batches,php time out and many more. So the best solution we could follow a batch_process ,Drush  with queue API.

  I had quite a bit of data to work with so I had to utilize the batch API. The Batch API allows you to run one or more method on to a large set of data without worrying about PHP timeouts and can provide feedback on the progress of the operation. I had created a module to handle the updating and importing of the library data. To create the batch queue you must build an array for batch_set();

function mymodule_setup_batch($start=1, $stop=100000) {
  //  ...
  //  Populate $lots_of_data from record $start to record $stop.
  //  ...
 
  //Break up all of our data so each process does not time out.
  $chunks = array_chunk($lots_of_data, 20);
  $operations = array();
  $count_chunks = count($chunks);
 
  //for every chunk, assign some method to run on that chunk of data
  foreach ($chunks as $chunk) {
    $i++;
    $operations[] = array("mymodule_method_to_work_on_a_small_part", array( $chunk ,'details'=> t('(Importing chunk @chunk  of  @count)', array('@chunk '=>$i, '@count'=>$count_chunks))));
    $operations[] = array("mymodule_another_method",array($chunk));
  }
 
  //put all that information into our batch array
  $batch = array(
    'operations' => $operations,
    'title' => t('Import batch'),
    'init_message' => t('Initializing'),
    'error_message' => t('An error occurred'),
    'finished' => 'mymodule_finished_method'
  );
 
  //Get the batch process all ready!
  batch_set($batch);
  $batch =& batch_get();
 
  //Because we are doing this on the back-end, we set progressive to false.
  $batch['progressive'] = FALSE;
 
  //Start processing the batch operations.
  drush_backend_batch_process();
}
You'll also have to write what our operation methods will do. Each of these will be called with the parameters we set up earlier. In this case both methods will work on the same data, one right after the other.
function mymodule_method_to_work_on_a_small_part ($chunk, $operation_details, &$context) {
  //Do something to $chunk, maybe create a node?
  $context['message'] = $operation_details; //Will show what chunk we're on.
}
function mymodule_another_method($chunk, &$context) {
  //Do some more work.
  $context['message'] = t('We have done a second thing to a chunk of data');
}
We also need to code the method that is called when it is finished:
function mymodule_finished_method($success, $results, $operations) {
  //Let the user know we have finished!
  print t('Finished importing!');
}

Drushing data

I have always enjoyed the use of Drush, but I have never created my own Drush commands. It turns out, it is a very easy process. I decided to go ahead and make an import command, so I could start the batch process off and import a section of the entire dataset from the terminal. I placed the above code into a file mymodule.drush.inc and created the following methods:
function mymodule_drush_command() {
  $items  = array();
  $items['myimport'] = array(
    'callback'    => 'mymodule_setup_batch',
    'description' => dt('Import'),
    'arguments'   => array(
      'start'     => "start",
      'stop'      => "stop",
    ),
  );
  return $items;
}
 
function mymodule_drush_help($section) {
  switch ($section) {
    case 'drush:myimport':
      return dt("import items from the Internal Database [start record] [end record].");
  }
}

No comments:

Post a Comment