Documentum Technical Information

This blog provides useful technical information about EMC Documentum. Moderated by Steve Garrison of Nerka IT, Inc.

Generic Batch Process for Reading from a Collection. Permits Sending Batch Size as Parameter.

clock April 5, 2010 06:26 by author troutguy

This has certainly been done before....but here is a way to process objects from a Java Collection (ArrayList, Vector, etc.) and to work with a group of objects in arbitrary batch sizes.  For example, this code snippet below may be useful if you need to run a massive data manipulation operation and are concerned about server time-outs, connections being lost, etc.  So one would run smaller batches to prevent data corruption and problems related to system limitations. When you want to execute data manipulations in batch mode it's not all that complicated but the snippet below may help in desiging Java code. 

 

This particular included sample is to assign a lifecycle (dm_process) to a select group of documents that come into the Method via an array list. So, data (object id's) are contained in the array list.  A Documentum Method is created and one passes the batch size parameter to the Java Method.  This permits easy modification to batch number while running the data manipulation Java classes.  Obviously the snippet below is part of a larger set of Java Classes & Methods for setting lifecycle.  But, as I state above, this is a generic batch processor that can be  used in any kind of data processing using Java.

1. boolean applyIt(ArrayList docIds,int batchSize,Map params){

2. int wholeNumBatches = getWholeNumBatch(docIds.size(),batchSize);

3. int modU = getModulus(docIds.size(),batchSize);

4. DfLogger.debug( "LM","DoApplyDLC.applyIt...WILL PROCESS " + wholeNumBatches + " WHOLE BATCHES AND A PARTIAL BATCH OF " + modU + " DOCUMENTS.",null, null);

5. int m = 0;

6. int t= batchSize;

7. for(int p=0;p<wholeNumBatches;p++){

8. for(int k=m;k<t;k++){

9. Object o = docIds.get(k);

10. String id = o.toString();

11. IDfSysObject sysObj = getSys(id,params);

12. try {

 

13. String typeName = sysObj.getTypeName();

14. IDfId policyId = fetchPolicyId(typeName,params);

15. sysObj.attachPolicy(policyId,"WIP",null);

16. DfLogger.debug("LM","DoApplyDLC.applyIt...set lifecycle for document : " + sysObj.getObjectName(),null, null);

 

17. } catch (DfException e) {

18. DfLogger.debug("LM","DoApplyDLC.applyIt...exception getting type name : " + e,null, null);

19. e.printStackTrace();

20. }

21. }

22. }

23. DfLogger.debug("LM","DoApplyDLC.applyIt...FINISHED PROCESSING BATCH NUMBER : " + (p+1),null, null);

24. m = m + batchSize;

25. t = t + batchSize;

26. }

27. //process partial batch, if there are any docs in a partial batch

28. if(modU>0){    ...........

29. } 

30. return true;

31. }

Interpretation:

Line 1 shows that you want to pass in a Collection object, and need batch size to come from a parameter or other source. 

Line 2 gets the number of whole batches to be processed. They way this Method works is that 1st whole batches are processed, then a single partial batch is processed.

Line 3 involves fetches the Modulus, which can be done something like this in a separate method. This is used for partial batch. This part is not shown above but would commence at Line 28.

Line 5 making new variable to use in initiating & counting - more about this later

Line 6 setting a new variable t to Batch Size - more about this later

Line 7 start of outside loop.  This outside loop runs for p, which is the number of whole batches you have. For example, if you have 1000 objects in a Collection and a batch size of 100, p will be 10.

Line 8 start of inner loop....variable m starts at 0, but is always increased by the batch size, k runs for number of times of batch size

Line 9 get an object of data at correct location from Collection

Lines 10-23 This is application specific and in this case involves changing an object to a string, getting what's called a sysobject from system, applying lifecyle. But any particular operation could be done here on the data fetched from Collection.

Line 24 set m to m + batch size to keep counter indexing correct location in Collection. Note this is outside inner loop so will keep reading down in the Collection past where the 1st batch already accessed.

Line 25 set t to t + batch size. Here again, this keeps indexing number "k" reading at correct location in Collection; otherwise with more than one whole batch Collection would be read again from 0 to the end of the batch.

Line 28. As stated above this modulus indicated number of objects in Collection "left over" to be processed as a single partial batch.  One can now do the same processing  to the partial batch if that is the requirement. 

 

 

private int getModulus(int arraySize,int batchSize){

 

int modU = 0;

modU = arraySize % batchSize;

 

DfLogger.debug(
"LM","DoApplyDLC.getModulus...modU: " + modU,null, null);

 

return modU;

 

}

 

 

Be the first to rate this post

  • Currently 0/5 Stars.
  • 1
  • 2
  • 3
  • 4
  • 5


How to submit a single document immediately for fulltext indexing with FAST engine

clock January 12, 2010 03:23 by author troutguy

Sometimes it is really useful to immediately test a document's ability to be fulltext-indexed on a FAST server.  It is possible, of course, to submit the document again and again through the Documentum Administrator (DA) interface. If document has failed you can right click the document (from DA) and click Resubmit. But then you have to wait until the Index Agent listener picks up the document for indexing and this is more time consuming. You can also cause a document resubmission by editing or checking and checking out but the solution using api below is fastest.

To submit a document right away: first get its object id and then go into API Testor in DA or IAPI32.exe to submit this short statement:

queue,c,090008468000289a,dm_fulltext_index_user,dm_force_ftindex,,,,debug

A second advantage to submitting above api command with "debug" option is that you get valuable information about the document AFTER it's fulltext indexed.  If you submitted the above command go to this location to locate debugging output:

...\Documentum\bea9.2\domains\DctmDomain\servers\DctmServer_IndexAgent1_DCTMFAST\data\IndexAgent1\

export\IndexAgent\dctmFAST_IndexAgent1\content\debug\090008468000289a_1

Sorry about the path...of course they have to make you look for the output!   Remember, some of your server names and index agents will be spelled differently than above; those parts in italics are variable.

There is one interesting file in this location now: an XML file with a listing of most (or all) of the attributes that will be indexed. In other words, the XML file contains all the information on attributes it is sending to the index.  See Fig. 1 below.

How are the attributes arranged inside the XML file?  Below is a small part of the document...enough to hopefully get you interested. As you can see FAST Instream is converted attribute values into XML elements and classified them by types (date, string, etc.). 

In summary, this technique for debugging a single document by forcing fulltext indexing through API is a useful way to debug entries if you have customized fulltext indexing or if you are troubleshooting a problem relating to metadata indexing.

 

<?xml version="1.0" encoding="UTF-8"?>

<dmftdoc dmftkey="090008468000289a">

    <dmftkey>090008468000289a</dmftkey>

    <dmftmetadata>

        <dm_document>

            <r_object_id dmfttype="dmid">090008468000289a</r_object_id>

            <object_name dmfttype="dmstring">Product Support Matrix.pdf</object_name>

            <r_object_type dmfttype="dmstring">dm_document</r_object_type>

            <r_creation_date dmfttype="dmdate">2008-10-23T14:17:54</r_creation_date>

            <r_modify_date dmfttype="dmdate">2008-10-23T14:17:54</r_modify_date>

            <r_modifier dmfttype="dmstring">Steve Garrison</r_modifier>

            <r_access_date dmfttype="dmdate">2008-10-23T08:19:16</r_access_date>

 XML file continues

 

 

Figure 1. Image of directory where debugging files stored

Be the first to rate this post

  • Currently 0/5 Stars.
  • 1
  • 2
  • 3
  • 4
  • 5