Sundance Suggestion WebLog

Please Enter any suggestions/observations into one of the following three categories:

1) Critical - causes sundance to crash or abort.
If this is something you need fixed ASAP, please also email phillips@cs.utah.edu or riloff@cs.utah.edu

2) Parsing/Extraction Errors - any incorrect performance you notice. This can include a bad parse, incorrect extraction made, incorrect POS tagging, etc.
Please include the actual text given to sundance, and the resulting sundance output.

3) Suggestions - anything you feel would be an improvement to sundance is welcome, such as additional caseframe patterns, printing options, etc.

*Note: updates can be performed by editing the file: /uusoc/sys/www/cs.utah.edu/nlp/SundanceUpdate.html

Critical Updates

Author Date Problem
Ellen 8/10/05 Pitt reports that autoannotate crashes when given an empty text file, which probably mean that Sundance crashes on them. I also found earlier references mentioning that AutoSlog crashes on empty text files. We really shouldn't *crash* on them. Also, seems to crash using sundance in -seg or -ext mode. *****Fix before next release***** (Bill)
Sid 1/15/07 All NP_has_<possessive> extraction patterns seem to be ignoring any selectional restrictions applied to them.

Parsing/Extraction Errors

Author Date Error
Bill Phillips 8/29/06 PP attachment is done incorrectly sometimes during Aslog-TS algorithm. Appears to be related to AuxVerbs.
Bill Phillips 8/29/06 Memory Leak has been encountered. It appears to be related to Reference counting mechanism. Needs to investigated further.
Art Munson (Cornell) 8/10/05 I thought I'd let you know that the default makefile didn't quite work when I installed on solaris. These lines specifically caused trouble:

## Version number
major := $(shell egrep '^.define +MAJOR_NUM' miscglobals.h | sed 's/.*\"\(.*\)\"/\1/')
minor := $(shell egrep '^.define +MINOR_NUM' miscglobals.h | sed 's/.*\"\(.*\)\"/\1/')

I think the problem is that make on solaris doesn't understand the := syntax. I seem to recall running into the problem with a make file I was writing a year ago. I worked around it by commenting out the above lines and hardcoding the version number.

Ellen 12/15/04 The recognizer patterns don't seem to fire when there are two successive instances of the same recognizer type. For example, here's the output for the (admittedly weird) sentence: "I gave the teacher George the teacher George" You'd expect both the IOBJ teacher George and the DOBJ teacher George to be labeled as people, but only the first one (IOBJ) is.

Original : I gave the teacher George the teacher George.
PreProc : I gave the teacher George the teacher George >PERIOD

CLAUSE:
NP SEGMENT-ProperNoun (SUBJ):
[I (LEX)(PN SINGULAR(HUMAN))]

VP SEGMENT (ACTIVE_VERB):
[gave (LEX)(V BASE)]

NP SEGMENT-Person_Name (IOBJ):
[the (LEX)(ART)]
[teacher (LEX)(N SINGULAR(PERSON_DESC))]
[George (LEX)(N SINGULAR(PERSON_NAME))]

NP SEGMENT (DOBJ):
[the (LEX)(ART)]
[teacher (LEX)(N SINGULAR(PERSON_DESC))]
[George (LEX)(N SINGULAR(PERSON_NAME))]

[>PERIOD (LEX)(PUNC)]

The recognizer patterns do seem to work correctly when there are two successive instances of *different* types (e.g., the IOBJ is a person and the DOBJ is a location).

Suggestions

Author Date Suggestion
Sid 02/27/08 Numbers are tagged is sundance in three different ways: POS tag, Semantic tag, and using the "&&". This could be problematic, and it would be a good idea to encode/tag numbers in just one way.
Bill 10/09/06 The way the semantic information is stored inside sundance is potentially problematic right now. This is because there are 2 separate data structures where the information can be stored. The POS class contains a list of semantics. But the POS class also contains a list of senses, where each sense has its own list of semantic tags. These two locations are in conflict. See comments by Dave B. in pos class regarding the creation of list of semantics directly in pos class. This seems to be the preferred method, but the old structures are still there and seem to be getting populated, as well as used by other code segments. For example, the Word::getSemFeatures() method still uses the semantics associated with the senses, while Word::hasPOSWithSemFeature() uses the list with the pos class! Need to find all places where the semantic info gets added, and decide on only one place to store it (probably pos is better). Then update all code to reflect this -WFP 10/06
Bill 8/29/06 The trigger word used internally as a key to trigger possible caseframes to apply during extraction mode running can result in too long a list being examined; this is especially true for patterns with AuxVps. A better key should be used to improve speed
Bill 12/05 Make the phrase level semantic tags ( that are generated from the recognizer patterns when a rolled NP is made from them, and stored as the RolledPatterns in the code) part of the semantic hierarchy. The PatternTypes will have to be incorporated into the semantic hierarchy to do this.
Bill 12/05 Make new extraction slot type for caseframes. The slot will extract only modifiers of the specified phrase. The modifiers to be extracted will be determined by a specified semantic category. For the given sem. category, every word starting with the first modifier tagged with the sem. category, up the the last word with the sem. category (not including the head) will be extracted. Any intervening words will also be extracted, regardless of their sem. tags. The syntax may appear something to the effect of EXT: MOD(SEM:DATE)
Bill 12/05 make activation fcns to look for a head word that has a semantic constraint, and possibly one for head doesn't have semantic constraint
Bill 12/05 Fix/improve passive recognition in sundance parsing
Bill 6/04 improve adj_< NP > pattern to fire on only adjectives, or also unknowns
Jingnan 7/04 For semantic constraint on caseframes, make sundance take negative constraint. (e.g. "~building" means words in this category should not be extracted. And if "phys-target" (parent category of "building") is also specified, it means anything other than "building" in "phys-target" should be accepted)