Affective user modeling.

Next: A sympathetic computer Up: Research Questions Previous: Research Questions

Affective user modeling.

One hard, and divisive, problem facing the AI community is that of building user models. Rather than more traditional models which focus on the mental processes of the user in problem solving situations [Van Lehn1988], we propose an alternative wherein only certain components of the affective state of the user is modeled. This is a much smaller problem, but one which should provide useful leverage. It might be considered akin to the feedback a responsive speaker might make use of when ``playing'' to her audience. We do not propose this as a full model of a user's emotional states which would then also require that all of the hard cognitive modeling problems be solved as well.

To implement simple affective user modeling, several components are required: (1) A structure which allows us to capture an agent's (in this case, the user's) outlook on situations that arise. This structure must include some concept of the role, personality, and current state of the user (within the context of shallow emotion reasoning), which together comprise the basis for the user's idiosyncratic way of construing the world. (2) A lexicon through which the user expresses his or her emotions to the computer. (3) A comprehensive set of emotion categories which allow for the mapping of emotion eliciting situations to the emotion expression lexicon, and vice versa. In our current work, as discussed above, we have implemented a broad, albeit shallow, representation of the first component, and a comprehensive, descriptive representation of the third.

The weakest link in such a system is in the lexicon. How does a computer, which has no understanding of faces, and which presumably has no mechanism for generating plausible explanations which might allow it to determine which emotions are likely to have arisen, know what emotion a user is expressing? In addressing this question we consider several leverage points which show promise in allowing us to work around this problem, at least to some degree. First, and most importantly, it might well prove to be true that users are motivated to express their emotions to the computer, provided that there is at least the illusion that the computer understands how they are feeling. Should this be so, then some latitude is afforded us in requiring that the user, who is adaptable in communication, conform to the protocol of the computer, which is not. Second, the comprehensive emotion model allows us to develop a large lexicon categorized by both emotion category and intensity. Third, speech recognition packages are advanced enough to capture some of what is of interest to us with respect to the lexicon.

For example, using the work of Ortony et al. as a guide [Ortony, Clore, & Foss1987], we built expressions containing emotion words, intensity modifiers, and pronoun references to different roles (e.g., I am a bit sad because he..., I am rather sick at heart about her..., I was pretty embarrassed after my...) [Elliott & Carlino1994]. We built phrases containing 198 emotion words (e.g. ...,bothered, brokenhearted, calm, carefree, chagrined, charmed, cheered, cheerful, cheerless,...). In preliminary runs we were able to detect 188 of the emotion words correctly on the first try, in context, with 10 false positives. Misses tended to be cases such as confusing ``anguish'' with ``anguished,'' and ``displeased'' with ``at ease.'' There were 10 other instances of difficulty with other parts of the phrases, such as confusing ``my'' with ``I.'' Most of these would have been caught by a system with even rudimentary knowledge of English grammar.

Additionally, in other preliminary runs of our speech recognition package the computer was able to recognize the seven emotion categories, anger, hatred, sadness, love, joy, fear, and neutral, which we did our best to communicate to it, when speaking the sentence, ``Hello Sam, I want to talk to you.'' In this small exercise we broke the sentence up into three parts, identifying each part as a ``word'' to the speech recognition system. We then trained each phrase for the seven different inflections. With practice we were able to get close to 100% recognition of the intended emotional state. To achieve this we had to be slightly theatrical, but not overly so, and there was a flavor of speaking with someone who was hard of hearing, but again, not overly so.

Once the lexicon is established, and minimal natural language parsing is in place through the use of an ATN or other simple system, tokens can either be interpreted directly as situations in themselves, or as a set of features indexing into a case-base to retrieve similar cases indicating a particular emotion category. To illustrate, on the one hand, from user input of ``I am satisfied with the results'' we might, for example, yield the situation: ``the user is satisfied now in response to the comparison of her answer with the one just provided by the computer''). On the other hand, given user input of ``I am happy now'' (spoken with a hateful inflection), we might yield the set of features: user expresses happiness, and user's inflection expresses hate, which in turn retrieves the cases of hatred masked by a contrasting verbal expression.

Assuming that such a system can be built, it raises the possibility of testing its numerous applications in diverse domains. We touch on these in the following sections, in concert with other issues.

Next: A sympathetic computer Up: Research Questions Previous: Research Questions

Clark Elliott
Thu May 2 01:02:59 CDT 1996