Wednesday, August 17, 2011

Removing Duplicate Event Entries in Google Calendar with DupEventRemover

This is really annoying! Since I started heavily relying on my Google Calendar, I have run multiple times into the trap of events getting duplicated when synchronizing other devices and calendars with Google Calendar.
For private use I have fully switched to Google Calendar and Google Contacts (yes, I know, now they have all my personal data...). At home I use Lightning to access my calendar in Thunderbird and gContactSync for synchronizing the address book. At work I also use Thunderbird in the same configuration wherever I can. In addition I have to use Lotus Notes sometimes as this is our official corporate system and for booking meeting rooms et cetera there is no way to by-pass it. For synchronizing Notes with Google I use AweSync. And to have access to all my data on the move I also sync my Android phone with Google.
Now from time to time AweSync looses the connection to the Notes server or Google or both and stops syncing. For some reason I was unable to track down sometimes the synchronization context gets lost then. Events in Lotus are then re-synced to Google and there get duplicated.
Nothing that does not happen everyday with hundreds for different configurations. Therefore I was quite sure to find an easy solution to this problem like the "Find and merge duplicates" option in Google Contacts or some small tool solving the issue. But an Internet search showed only some commercial tools and a few links to Apple Scripts.
Technically speaking there are two options for cleaning up Google Calendar:
  1. Accessing the calendar online via Google's API (that's what the commercial tools do).
  2. Exporting the calendar in iCalendar format to an *.ics file, eliminate the duplicates in the local file and reimport the cleaned-up calendar (that's what the Apple Scripts do; at least the clean-up part).
Now I don't have an Apple and I don't plan to get one. In addition I was not willing to pay for an full iCal tool suite when I only have a quite straight forward parsing job to do to get rid of the duplicated events in my calendar.

So there is only one option left (apart from living with an increasing number of duplicate events....): DIY

I decided to go for the off-line approach: Export iCal, work on the file locally and reimport the result. First I went looking for an easy to use API to handle the iCal format from Java: I found iCal4j by Ben Fortuna. Thanks to Ben for that fine piece of work! Next I did some coding.... and here we go, my calendar is clean and tidy again!

Now this post could end here, letting you know my problem is solved, but not going further into making a solution available. As this would be crap and not really very nerdy, I decided to provide my source code here. I do not have time at the moment to develop this into a pre-compiled easy-to-use software with a fancy front-end. Nevertheless I like to share what I have done at this early stage as there seem to be no other tools out there.

The easiest way to get the code working is to import it to Eclipse and add iCal4j to the build path.

 // ===========================  
 // DupEventRemover - License  
 // ===========================  
 //  
 // Copyright (c) 2011, Tobias Zimmer  
 // All rights reserved.  
 //  
 // Redistribution and use in source and binary forms, with or without  
 // modification, are permitted provided that the following conditions are met:  
 //  * Redistributions of source code must retain the above copyright  
 //   notice, this list of conditions and the following disclaimer.  
 //  * Redistributions in binary form must reproduce the above copyright  
 //   notice, this list of conditions and the following disclaimer in the  
 //   documentation and/or other materials provided with the distribution.  
 //  * Neither the name of Tobias Zimmer nor the  
 //   names of any other contributors may be used to endorse or promote products  
 //   derived from this software without specific prior written permission.  
 //  
 // THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND  
 // ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED  
 // WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE  
 // DISCLAIMED. IN NO EVENT SHALL COPYRIGHT HOLDERS AND CONTRIBUTORS BE LIABLE FOR   
 // ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES  
 // (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;  
 // LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND  
 // ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT  
 // (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS  
 // SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.  
   
   
 package de.cwtz.dupeventremover;  
   
 import java.io.FileInputStream;  
 import java.io.FileNotFoundException;  
 import java.io.FileOutputStream;  
 import java.io.IOException;  
 import java.util.Iterator;  
   
 import net.fortuna.ical4j.data.CalendarBuilder;  
 import net.fortuna.ical4j.data.CalendarOutputter;  
 import net.fortuna.ical4j.data.ParserException;  
 import net.fortuna.ical4j.model.Calendar;  
 import net.fortuna.ical4j.model.Component;  
 import net.fortuna.ical4j.model.ComponentList;  
 import net.fortuna.ical4j.model.ValidationException;  
 import net.fortuna.ical4j.model.component.VEvent;  
 import net.fortuna.ical4j.util.CompatibilityHints;  
   
 public class DupEventRemover {  
   
      /**  
       * @param args  
       */  
   
      public static void main(String[] args) {  
           // TODO Auto-generated method stub  
   
           CompatibilityHints.setHintEnabled(  
                     CompatibilityHints.KEY_RELAXED_VALIDATION, true);  
   
           // Reading the file and creating the calendar  
           CalendarBuilder builder = new CalendarBuilder();  
           Calendar cal = null;  
           Calendar[] calOut = null;  
           String inputFile = null;  
           String outputFile = null;  
           String extensionFile = ".ics";  
             
           // Max number of events in one calendar file  
           int googleLimit = 2500;  
             
           //System.out.println(args.length);  
   
           if (args.length > 0) {  
                if (args[0].equals("--help")) {  
                     System.out.println("" +  
                               "Reads an ICal calendar file, removes duplicate event entries and writes the result to one or more new files.\n" +  
                               "\n" +  
                               "DupEventRemover [source | --help] [destination]\n" +  
                               "\n" +  
                               " source \t Specifies the file to read calendar data from. Make sure you give the full name including the file extension.\n" +  
                               " destination \t Specifies the file(s) to write the new calendar to. Omit the file extension. It will be added automatically\n" +  
                               " --help \t Displays this help.\n" +  
                               "\n" +  
                               "If you don't specify 'source' and 'destination', DupEventRemover will by default look for a file\n" +  
                               "named 'my.ics' in it's current directory and write the new calendar to 'my_total_new_x.ics', where x is the number of the file.\n" +  
                               "If you don't specify destination, but only source, the default destination will be used.");  
                     System.exit(0);  
                } else {  
                     inputFile = args[0];  
                }  
           } else {  
                inputFile = "my.ics";  
           }  
   
           if (args.length > 1) {  
                outputFile = args[1];  
           } else {  
                outputFile = "my_total_new_";  
           }  
   
           System.out.println("Reading calendar file...");  
   
           try {  
                cal = builder.build(new FileInputStream(inputFile));  
           } catch (IOException e) {  
                System.out.println(e.getMessage());  
                System.out.println("Try typing 'DupEventRemover --help' for help.");  
                // e.printStackTrace();  
                System.exit(1);  
           } catch (ParserException e) {  
                System.out.println(e.getMessage());  
                // e.printStackTrace();  
                System.exit(1);  
           }  
   
           System.out.println("Start processing. Please wait...");  
   
           int nProcessed = 0;  
           int nDeleted = 0;  
   
           // For each VEVENT in the ICS  
           for (Object o : cal.getComponents("VEVENT")) {  
                Component c = (Component) o;  
                VEvent e = (VEvent) c;  
   
                for (Iterator i = cal.getComponents(Component.VEVENT).iterator(); i  
                          .hasNext();) {  
                     VEvent event = (VEvent) i.next();  
   
                     if ((event.getSummary() != null) && (e.getSummary() != null)  
                               && (event.getStartDate() != null)  
                               && (e.getStartDate() != null)  
                               && (event.getEndDate() != null)  
                               && (e.getEndDate() != null)) {  
                          if ((event.getUid() != e.getUid())  
                                    && (event.getSummary().getValue().equals(e  
                                              .getSummary().getValue()))  
                                    && (event.getStartDate().getValue().equals(e  
                                              .getStartDate().getValue()))  
                                    && (event.getEndDate().getValue().equals(e  
                                              .getEndDate().getValue()))) {  
                               nDeleted++;  
                               cal.getComponents().remove(c);  
                               break;  
                          }  
                     } else {  
                          if (e.getSummary() == null) {  
                               nDeleted++;  
                               cal.getComponents().remove(c);  
                               break;  
                          } else {  
                               // debug:  
                               // System.out.println((VEvent) event);  
                          }  
                     }  
                }  
   
                nProcessed++;  
   
                if ((nProcessed % 100) == 0)  
                     System.out.print(".");  
           }  
           System.out.println("");  
           System.out.println("Number of records processed: " + nProcessed);  
           System.out.println("Number of records deleted: " + nDeleted);  
             
           calOut = new Calendar[(int)Math.floor((nProcessed - nDeleted)/googleLimit)+1];  
           for (int i = 0; i < calOut.length; i++){  
                calOut[i] = new Calendar(cal.getProperties(),new ComponentList());  
           }  
             
           int limitCounter = 0;  
           int calendarCounter = 0;  
           for (Iterator i = cal.getComponents(Component.VEVENT).iterator(); i.hasNext();) {  
                 VEvent event = (VEvent) i.next();  
                   
                 calOut[calendarCounter].getComponents().add(event);  
                 limitCounter++;  
                   
                 if (limitCounter >= googleLimit){  
                      limitCounter = 0;  
                      calendarCounter++;  
                 }        
           }  
             
           // write new calendar file(s)  
           for (int i = 0; i < calOut.length; i++){  
                FileOutputStream fout = null;  
                try {  
                     String outFile = outputFile + i + extensionFile;  
                     fout = new FileOutputStream(outFile);  
                } catch (FileNotFoundException e) {  
                     // TODO Auto-generated catch block  
                     e.printStackTrace();  
                }  
        
                CalendarOutputter outputter = new CalendarOutputter();  
                try {  
                     outputter.output(calOut[i], fout);  
                } catch (IOException e) {  
                     // TODO Auto-generated catch block  
                     e.printStackTrace();  
                } catch (ValidationException e) {  
                     // TODO Auto-generated catch block  
                     e.printStackTrace();  
                }  
           }  
      }  
 }  
   

You see it's extremely rudimentary and even hack-style. What it does is, it goes through an input iCal file finds all duplicates by searching for same date and time and description with different IDs and deletes the found events. It also deletes events with an empty description. Then it writes the newly created calendar to a series of files each containing a fixed maximum number of events (at the moment 2500).  

This limit I have introduced as there is a daily quota to the number of events you can create by uploading calendar files to your Google Calendar. The actual quota is not documented to my knowledge. It said on the Internet it would be 5000, but testing this I ran into difficulties sometimes already with 4500. 2500 seems to be well on the save side, but might be inefficient to upload bigger calendars (mine is about 10k events, so it took me 4 days to upload). 

Some words of caution: You mess around with your calendar at your own risk. Make backup copies and store them in save places. As the license of the code indicates this is provided 'as is', I'm not taking any liability neither for the fitness of the code nor for the correctness of any other information in this post.

No comments:

Post a Comment