[lug] Looking for C++ help

Tkil tkil at scrye.com
Wed Feb 21 21:20:11 MST 2001


>>>>> "Ken" == Ken Weinert <kenw at ihs.com> writes:

Ken> OK, I'm trying to find an elegant way to do this.

Ken> I'm using OpenSP to parse an incoming SGML file - I need to write
Ken> out flat files (maintaining relationships) for later loading into
Ken> a database.

Ken> I've a document class with member functions for each of the
Ken> relevant tags. My question is this: is there an easy/elegant way
Ken> of taking the the startElementEvent of onsgmls and dispatching
Ken> the collected data into my document class?

after a quick glance through the OpenSP docs:

   http://openjade.sourceforge.net/doc-1.4/index.htm

i have to ask -- what data are you trying to dispatch and record?

i'm also not clear what you mean by "writing a flat file, maintaining
relationships".  relationships between what?

from what little i know of SGML, you can take a broad view where
elements have three things that they "own":  their identification,
their attributes, and their contents.  which of those three are you
trying to record for later?

put another way, you end up with a strict tree of elements, where each 
element has some intrinsic information (most likely, its name), a
collection of information (attribute/value pairs) attached to each
element, and zero or more children under that element.

looking at the text stream coming in, you're seeing (in effect) an
in-order traversal of this tree.

so, if you only care about elements and their attributes, then you
should be able to watch only the startElement method.  If you care
about the contents between the start tag and the end tag, you will
need to do something a bit more hairy.

finally, how to dispatch data.  if you're talking about Attributes,
the most OO method that comes to mind is to subclass Attribute to
handle the different ones you care about, then just have a single
virtual method that does different things depending on what class it
is.

in this case, however, it seems that you get the Attributes preformed,
which means that you have to dispatch off the attribute name.  you can
do this with something like the STL map container.  if your document-
handling class is, say "MyDoc", and each of the attribute handlers
accepts (say) a const Attribute reference, you might do:

   // typedef MyDocAttrHandler as pointer to member function
   typedef void (MyDoc::*MyDocAttrHandler(const Attribute &));

   // establish a dictionary associating attribute names to their handlers.
   typedef std::map<std::string, MyDocAttrHandler> AttrMap;

you might use hash_map, instead, depending on how many you had to deal 
with.  (for that matter, simple linear scan is probably fast enough,
but this use of map is pretty clean.)

then, if you wanted to dispatch each of your attributes, you could do
something like this within your class:

   private:
      AttrMap _attr_map;
   
   public:
      void MyApp::startElement(const StartElementEvent & ev)
      {
         for (size_t i = 0; i < ev.nAttributes; i++)
         {
            Attribute * ap = ev.attributes+i;
            std::string name(ap->name.ptr, ap->name.len);
            AttrMap::const_iter ami = _attr_map.find(name);
            if (ami != _attr_map.end())
            {
               MyDocAttrHandler h = ami->second;
               (doc_instance .* h)(*ap);
            }
            else
            {
               cerr << "no handler for attribute \"" << name << "\"" << endl;
            }
         }
      }

where doc_instance is the actual instance of MyDoc.  (and, if it's not
obvious yet, none of this code has been compiled, let alone tested.
:) i'm not sure i got the syntax / usage for that pointer-to-member-
function thing correct (but see the end of this message).  there's a
bit about them in Stroustrup (3rd ed, page 420).

if you're trying to actually record all the contained entities, you
suddenly have to do two things.  first, and easier, is to maintain a
stack of where you are in the document, so you know when you're done
with one particular element.  the harder task is to figure out a
useful serialized format that you can use to write a "flat file" that
still "maintains relationships".  that's rather what SGML does quite
well, so I'm not sure what you're trying to do here.  (maybe reduce an 
arbitrarily deep tree to two values (parent, child) on each line?)

anyway.  hopefully this has given you some ideas.  i have a fair pile
of examples that are similar in spirit, despite being written in a
very different language (perl, using libwww-perl's HTML::Parser
class).

hope this helps,
t.

p.s. here's a simple example of using a map to hold pointers-to-
     member-functions:

| #include <iostream>
| using std::cout;
| using std::cerr;
| 
| #include <iomanip>
| using std::endl;
| 
| #include <string>
| using std::string;
| 
| #include <map>
| 
| class PTMFTest
| {
| public:
|     void foo(int i) { cout << "foo: i=" << i << endl; }
|     void bar(int i) { cout << "bar: i=" << i << endl; }
|     void baz(int i) { cout << "baz: i=" << i << endl; }
| };
| 
| typedef void (PTMFTest::*PTMF)(int);
| 
| typedef std::map<string, PTMF> PTMFMap;
| 
| int main(int argc, char * argv [])
| {
|     PTMFMap my_map;
|     my_map["foo"] = &PTMFTest::foo;
|     my_map["bar"] = &PTMFTest::bar;
|     my_map["baz"] = &PTMFTest::baz;
| 
|     PTMFTest t;
| 
|     string s;
|     int i;
|     while (cin)
|     {
|         cin >> s >> i;
|         cout << "dispatching " << s << "(" << i << ")" << endl;
|         PTMFMap::const_iterator mi = my_map.find(s);
|         if (mi != my_map.end())
|             ( t .* (mi->second) ) (i);
|         else
|             cerr << "no handler found for \"" << s << "\"" << endl;
|     }
| 
|     return 0;
| }




More information about the LUG mailing list