This has certainly been done before....but here is a way to process objects from a Java Collection (ArrayList, Vector, etc.) and to work with a group of objects in arbitrary batch sizes. For example, this code snippet below may be useful if you need to run a massive data manipulation operation and are concerned about server time-outs, connections being lost, etc. So one would run smaller batches to prevent data corruption and problems related to system limitations. When you want to execute data manipulations in batch mode it's not all that complicated but the snippet below may help in desiging Java code.
This particular included sample is to assign a lifecycle (dm_process) to a select group of documents that come into the Method via an array list. So, data (object id's) are contained in the array list. A Documentum Method is created and one passes the batch size parameter to the Java Method. This permits easy modification to batch number while running the data manipulation Java classes. Obviously the snippet below is part of a larger set of Java Classes & Methods for setting lifecycle. But, as I state above, this is a generic batch processor that can be used in any kind of data processing using Java.
1. boolean applyIt(ArrayList docIds,int batchSize,Map params){
2. int wholeNumBatches = getWholeNumBatch(docIds.size(),batchSize);
3. int modU = getModulus(docIds.size(),batchSize);
4. DfLogger.debug( "LM","DoApplyDLC.applyIt...WILL PROCESS " + wholeNumBatches + " WHOLE BATCHES AND A PARTIAL BATCH OF " + modU + " DOCUMENTS.",null, null);
5. int m = 0;
6. int t= batchSize;
7. for(int p=0;p<wholeNumBatches;p++){
8. for(int k=m;k<t;k++){
9. Object o = docIds.get(k);
10. String id = o.toString();
11. IDfSysObject sysObj = getSys(id,params);
12. try {
13. String typeName = sysObj.getTypeName();
14. IDfId policyId = fetchPolicyId(typeName,params);
15. sysObj.attachPolicy(policyId,"WIP",null);
16. DfLogger.debug("LM","DoApplyDLC.applyIt...set lifecycle for document : " + sysObj.getObjectName(),null, null);
17. } catch (DfException e) {
18. DfLogger.debug("LM","DoApplyDLC.applyIt...exception getting type name : " + e,null, null);
19. e.printStackTrace();
20. }
21. }
22. }
23. DfLogger.debug("LM","DoApplyDLC.applyIt...FINISHED PROCESSING BATCH NUMBER : " + (p+1),null, null);
24. m = m + batchSize;
25. t = t + batchSize;
26. }
27. //process partial batch, if there are any docs in a partial batch
28. if(modU>0){ ...........
29. }
30. return true;
31. }
Interpretation:
Line 1 shows that you want to pass in a Collection object, and need batch size to come from a parameter or other source.
Line 2 gets the number of whole batches to be processed. They way this Method works is that 1st whole batches are processed, then a single partial batch is processed.
Line 3 involves fetches the Modulus, which can be done something like this in a separate method. This is used for partial batch. This part is not shown above but would commence at Line 28.
Line 5 making new variable to use in initiating & counting - more about this later
Line 6 setting a new variable t to Batch Size - more about this later
Line 7 start of outside loop. This outside loop runs for p, which is the number of whole batches you have. For example, if you have 1000 objects in a Collection and a batch size of 100, p will be 10.
Line 8 start of inner loop....variable m starts at 0, but is always increased by the batch size, k runs for number of times of batch size
Line 9 get an object of data at correct location from Collection
Lines 10-23 This is application specific and in this case involves changing an object to a string, getting what's called a sysobject from system, applying lifecyle. But any particular operation could be done here on the data fetched from Collection.
Line 24 set m to m + batch size to keep counter indexing correct location in Collection. Note this is outside inner loop so will keep reading down in the Collection past where the 1st batch already accessed.
Line 25 set t to t + batch size. Here again, this keeps indexing number "k" reading at correct location in Collection; otherwise with more than one whole batch Collection would be read again from 0 to the end of the batch.
Line 28. As stated above this modulus indicated number of objects in Collection "left over" to be processed as a single partial batch. One can now do the same processing to the partial batch if that is the requirement.
private int getModulus(int arraySize,int batchSize){
int modU = 0;
modU = arraySize % batchSize;
DfLogger.debug("LM","DoApplyDLC.getModulus...modU: " + modU,null, null);
return modU;
}