Psychometric Description of the True Compatibility Test™ --
A Proprietary Online Matchmaking System
Rense Lange
Illinois State
Board of Education
Integrated Knowledge
Systems
Ilona Jerabek
Plumeus, Inc. /
Queendom.com
James Houran
TRUE.com
Abstract.
Compatibility tests are the foundation of many online
matchmaking services, but psychometric support for their use is ambiguous or
unavailable. This paper summarizes the unprecedented application of Rasch
scaling to assess the structure and validity of an online proprietary
compatibility test, termed the TRUE Compatibility Test (TCT). Contrary to
previous efforts at quantifying long-term, romantic compatibility, the TCT
integrates both the principles of similarity and complementarity of partners’
characteristics. The theoretical rationale of the measured constructs,
technical quality and validity of the test are outlined. Furthermore, the
findings are discussed with respect to the conceptualization and measurement of
relationship quality. It is argued that models of assortative mating in terms
of gross similarity or complementarity are oversimplifications and that a
schema or “couple-centered” approach is a more valid predictor of relationship
satisfaction and stability. Specifically, it appears that men and women have
different definitions of relationship quality. Thus irrespective of the
necessary but unquantifiable element of “romantic chemistry,” our research suggests
that couples with satisfying and stable relationships are distinguished by
their ability to integrate qualitatively different issues into the relationship
via complex mental processes.
Introduction
The Internet has become a standard fixture in our society, with communication being one of its most popular uses. Through extended communication on the Internet, many users have formed relationships with others online. However, research has only recently begun to address the subject of online relationship development (see e.g., Bonebrake, 2002; Wolak et al., 2003). Kiesler and Kraut (1999) discussed how the nature of online relationships varies, but it is clear that flirting and dating (Whitty, 2003, 2004) and other forms of social networking (Ahuvia & Adelman, 1992) constitute an important aspect of the Internet phenomenon.
In light of these trends, it is not surprising that there is an apparent increase in the use and societal acceptance of so-called “compatibility tests” offered by online matchmaking services (Houran et al., 2004). Compatibility testing typically refers to a method of pairing unfamiliar people for long-term, romantic relationships based on the demographics, stated personal preferences, and personality profiling of individuals within a candidate pool. This type of testing is somewhat different, and arguably more difficult, than programs such as PREPARE and ENRICH that assess existing couples on the critical tasks related to early marital adjustment (see e.g., Fowers & Olson, 1986). Unfortunately, evidence for many advertised compatibility tests is either blatantly missing or lacking scientific standards (Houran, 2004, Houran et al., 2004; cf. Thompson et al., 2005). However, we are aware of two notable exceptions. First, Wilson and Cousins (2003b) published in a peer-review journal their Wilson Relationship Compatibility Indicator (WRCI) test, which yields “compatibility quotients” for couples. This test is the foundation of the matchmaking service and website of Cybersuitors.com. Wilson and Cousins have shown that heterosexual couples’ scores on this test show significant and positive correlation (average r = .31, p < .01) with scores on the Marital Adjustment Test (Locke & Wallace, 1959).
The WRCI is based on the principle of homogamy (similarity of partners’ characteristics), as opposed to the principle of complementarity (dissimilarity) of partners’ characteristics. As reviewed by Wilson and Cousins (2003a, 2003b), and recently echoed in new research by Luo and Klohnen (2005), cross-sectional and longitudinal research both suggest that similarity (“birds of a feather flock together”) is a better predictor of relationship quality than complementarity (“opposites attract”). It is important to note that this conclusion is also a gross oversimplification. The degree of similarity observed depends on the particular individual-difference domain studied, with romantic partners showing strong similarity in age, political, and religious attitudes; moderate similarity in education, general intelligence, and values; and little or no similarity in personality characteristics (for reviews, see Klohnen & Mendelson, 1998; Watson et al., 2004). The second example of an evidence-based compatibility test concerns the matchmaking service and website of TRUE.com (formerly TRUEBeginnings). TRUE commissioned the development of a comprehensive, online compatibility test designed to be broader in scope than the WRCI or the PREPARE marital preparation inventory via an integration of the mixed literature on similarity and complementarity. Furthermore, it was desired that the test be applicable to both heterosexual and same-sex matching. The resulting product was called the TRUE Compatibility Test™ (TCT).
The Present Paper
This report provides technical information concerning the
TCT – summarized from the comprehensive Technical
Manual for the TRUE Compatibility Test (TRUE & Jerabek, 2004). This
online Manual spans nearly 100 pages
and is available to any interested party for full scrutiny at: http://www.true.com/images/tctmanual.pdf?svw=footer.
Unlike the WRCI and other advertised compatibility tests, we show that the TCT
is the first compatibility test offline or online with reliability and validity
as stringently defined in the 1999 edition of the Standards for Educational
and Psychological Testing as issued jointly by the American Educational
Research Association, the American Psychological Association, and the National
Council on Measurement in Education (AERA, APA, NCME, 2002).[1]
As such, this paper is part of an active research program initiated by TRUE to encourage the sharing of evidence and findings among professional
and academic psychologists. Moreover, this report is intended to support the
perception of TRUE and its contractors that there is an
increasing need to enhance and maintain the status of online relationship
research and cyberpsychology in general.
Accordingly, in a
first section we provide a detailed theoretical rationale behind the
construction of the TCT. This section describes the psychological and
interpersonal variables that are assessed by this instrument, while providing
sample questions and references to the academic literature that formed the
basis for the construction of the TCT. Additional information regarding the
contents of the TCT can be found in Appendix A of the online Manual. A following section describes
the psychometric rationale for the TCT provided by Rasch scaling techniques.
This section includes an overview of the mathematical and statistical
considerations that are involved in “scoring” the
TCT. Finally, we present a section that details Rasch analyses of the TCT data
of 11,576 individuals of the seventeen major factors of the TCT.[2]
These analyses include items’ model fit, the results of bias tests, and
validity evidence. Throughout, pertinent technical results are listed in
the Appendices to the online Manual.
A final section discusses the theoretical import of the TCT’s development and
validation for refining issues in conceptualizing and measuring relationship
quality.
Although great effort was made to be complete as possible in describing the TCT, including its reliability, validity, matching algorithms, theoretical rationale, and scoring methods, the TCT is a proprietary product of TRUE.com. To protect this product, the text of most items is omitted, and the analyses refer to the TCT items by number only. Similarly, to protect the identity of the items and factors, the latter are identified only by a number – i.e., it is not possible to connect items and factors. Finally, “adaptive”[3] (i.e., answer- or group-specific) items that are not answered by all TCT test takers were omitted from all psychometric analyses. This left a total of 218 items for analysis.
Overview of the TCT
The TCT was
constructed by the second author’s online psychological testing firm, Plumeus,
Inc. (www.Queendom.com). Drawing on a thorough literature review, the test was
designed to address 99 key variables that determine each test-taker’s
long-term, romantic compatibility with potential partners. For some of the
variables, complementary or opposite matches are deemed better; in others, a
similar match is deemed more ideal (see e.g., Dryer & Horowitz, 1997; Klohnen & Luo, 2003). The goal of the TCT is to pair
people appropriately with potential partners across as many relationship
variables as possible. The test is arranged by top-level areas (factors), 13
second-level, more specific factors (subfactors), and 65 third-level, narrow
characteristics (subscales) that make up these factors.
The TCT
contains a pool of 616 items, some of which are core (administered to everyone)
and some that are adaptive (presented dynamically only if the test-taker is
inconsistent in his/her responses for the core questions in that factor).[4] On the basis of his/her responses, the
test-taker receives a Feedback Report
(a profile of his/her personality, habits, and attitudes, and how they can
affect his/her romantic relationships), an Ideal
Partner Report (a description of who the most complementary partner would
be for him or her), and a Compatibility
Report that reveals in detail to what degree potential partners are
compatible with the test-taker. The
test-taker also receives advice and tips that are tailored to his/her
particular issues.
Overall Structure
The top-level
factors are the largest, most general variables. These factors cover a spectrum
of areas relevant to relationships, ranging from conflicts to sex life and
communication. In addition,
The subfactors formed the initial building blocks of the TCT – the main elements that were thought to be most important in the development of the test. Like the subscales, all subfactors are based on and supported by empirical research. In particular, to meet the criteria for inclusion, there needed to be strong evidence of the importance of each issue in relationship satisfaction or relationship stability (longevity). To ensure face validity, they also needed to be relevant in a common-sense fashion. For an explanation of what the more targeted subscales measure, refer to Appendix A of the online Manual.
Although the TCT matching algorithm must
necessarily remain proprietary, we note that it uses responses to certain
specific question to match people with particular tastes. For instance, test-takers that indicated
that romantic people are unattractive to them are not paired with hopeless
romantics. Throughout, the matching process is governed mainly by the larger
factors, i.e., item combinations having the greatest reliability.
In general, the TCT matching algorithms use a
compatibility matrix that includes:
·
compatibility
levels of individual traits using similarity, dissimilarity or complementarity
algorithms depending on the issue
·
interactions
between specific traits
·
gender-specific
weighting of traits
·
relative
importance weighting of traits
·
bidirectional
algorithms for computing the final TRUE Compatibility Index TM (TCI).
The TCI is a metric akin to Wilson and Cousin’s (2003a, 2003b) “Compatibility
Quotient.”
Table 1 lists some illustrative questions used in the TCT. Additional details can be found in the text, and the remaining factors are presented in Appendix A of the online Manual.
|
Name |
Examples
of questions |
1) |
Comfort
with Vulnerability |
I find it ________ to say things like “I love
you”, and “I am happy I found you”. a)
Very
easy b)
Easy c)
Awkward d)
Difficult e)
Very
difficult I
give people the benefit of the doubt. a)
Completely
true b)
Sort
of true c)
Somewhat
true/somewhat false d)
Sort
of false e)
Completely
false |
2) |
Open-mindedness |
I appreciate the fact that my partner and I have
differences of opinion, because discussing them helps us grow as people. a)
Completely
agree b)
Mostly
agree c)
Somewhat
agree d)
Mostly
disagree e)
Completely
disagree I encourage my partner to work towards his/her
goals, even if I disagree with them. a)
Completely
agree b)
Mostly
agree c)
Somewhat
agree d)
Mostly
disagree e)
Completely
disagree |
3) |
Ability
to Communicate |
I
become absorbed in what I’m saying, and fail to notice if others are bored or
offended. a)
Almost
never b)
Rarely
c)
Sometimes
d)
Quite
often e)
Most
of the time I
try to be sensitive to the needs of others and anticipate their reactions to
my words and actions. a)
Always
true b)
Often
true c)
Sometimes
true d)
Rarely
true e)
Never
true |
4) |
Sexual
Prowess |
Sexual fantasies are natural. a)
Completely
agree b)
Mostly
agree c)
Somewhat
agree d)
Mostly
disagree e)
Completely
disagree Ideally,
how often would you like to have sex? a)
At
least once a day. b)
Two
to three times a week c)
Once
a week d)
Two
to three times a month Once a month or less |
5) |
Interaction
with Others |
You
are one of three equally deserving employees eligible for a big promotion at
work. How likely are you to think
that you will be the one chosen for the promotion? a)
Completely
unlikely b)
Unlikely c)
Somewhat
likely/somewhat unlikely d)
Likely e)
Highly
likely Spending
time with others wears me out. a)
Completely
untrue b)
Mostly
untrue c)
Somewhat
true/somewhat false d)
Mostly
true e)
Completely
true |
6) |
Social
Network |
I am very close with my family. a)
Completely
true b)
Mostly
true c)
Somewhat
true d)
Mostly
untrue e)
Completely
untrue Without my loved ones, I would be lost. a)
Completely
true b)
Mostly
true c)
Somewhat
true d)
Mostly
untrue e)
Completely
untrue |
7) |
Conscientiousness |
When
I commit to doing something: a)
I
forget about it. b)
I
do it when/if I get around to it. c)
I
get it done, unless something more important comes up. d)
I
get it done. When
it comes to orderliness, I’m: a)
A
complete slob. b)
A
bit of a slob. c)
Average. d)
A
bit of a neat freak. e)
An
utter neat freak. |
8) |
Integrity |
My
friends would tell you: a)
I’m
honest to a fault. b)
I’m
generally honest. c)
I
sometimes stretch the truth. d)
Not
to believe a word I say If
I am running late for an appointment, I: a)
Contact
the person/people I’m meeting to let them know. b)
Rush
but don’t call. c)
Get
there when I get there. |
9) |
Adventurousness |
I enjoy going with the flow and being playful. a)
Completely
true b)
Mostly
true c)
Somewhat
true d)
Mostly
untrue e)
Completely
untrue I am a creature of habit. a)
Completely
true b)
Mostly
true c)
Somewhat
true d)
Mostly
untrue e)
Completely
untrue |
10) |
Rigidity |
I
have a tendency to resist changing how I am used to doing things. a)
Completely
agree b)
Mostly
agree c)
Somewhat
agree d)
Mostly
disagree e)
Completely
disagree You
and your partner have a strong difference of opinion. Are you willing to let it go? a)
Yes,
I’ll drop it. b)
It
depends on how important the issue is. c)
No,
I’ll try to convince him/her to accept my point of view. d)
No,
I’ll insist that my partner accept my point of view. |
11) |
Dominance |
Even when I’m quite upset with my partner, it’s hard for me to
bring it up. a)
Most
of the time b)
Often c)
Sometimes d)
Rarely e)
Never I make sure that my partner hears my point of
view. a)
Most
of the time b)
Often c)
Sometimes d)
Rarely e)
Never |
12) |
Healthy
Attachment |
I get more attached to others than they seem to
get to me. a)
Completely
true b)
Mostly
true c)
Somewhat
true d)
Mostly
untrue e)
Completely
untrue I
need ______ reassurance from my partner about his/her feelings for me. a)
Constant b)
Frequent c)
Occasional d)
Rare e)
No |
13) |
Psychological
Strength |
People
tell me that my moods are unpredictable. a)
Quite
often b)
Often c)
Sometimes d)
Rarely e)
Never When
I’m presented with a problem, I’m able to develop an effective solution. a)
Almost
never b)
Rarely
c)
Sometimes
d)
Quite
often e)
Most
of the time |
Relationship
Variables
Communication Style. Several studies have shown that effective and compatible communication style is one of the pillars of relationships. Couples with ineffective or unconstructive communication are more likely to report relationship dissatisfaction and distress (Christensen & Shenk, 1991; Rogge & Bradbury, 1999). Having compatible communication skills improves a couple's chance at happiness. Many potential stumbling blocks in relationships can be overcome by communication; it is the greatest key to intimacy. In fact, lack of emotional closeness and feelings of alienation are the best predictors of depression in both men and women (Heim & Snyder, 1991). Reported relationship quality has been shown to be influenced by positive communication behaviors, such as spousal support, companionship, intimacy and friendship (Jerabek, 2003; Julien et al., 2003; Pasch et al., 1997; Pasch & Bradbury, 1998; Prager, 1995).
An important aspect of the Communication Style factor is open-mindedness, which is comprised of tolerance of mood instability, tolerance for differences in opinion, tolerance for goal differences and need for control. This construct measures how amenable the test-taker is to differing viewpoints, along with how willing s/he is to relinquish control. According to a study by Shackelford and Buss (1997), lack of openness in both men and women results in lower esteem for them on the part of their spouses.
In addition, self-disclosure has been shown to be a good predictor of relationship satisfaction, in both men and women (Hendrick et al., 1988). Self-disclosure requires comfort with vulnerability, which is based on Bowlby’s defensively separate construct in attachment theory (Bowlby, 1969, 1973). According to his theory, the defensively separate have a harder time becoming closer to others. Their relationships are characterized by less overall satisfaction, not to mention lower quality – they lack trust, and they experience more unpleasant emotions than positive ones (Meyer & Pilkonis, 2001; Simpson, 1990).
However, the amount of communication people desire in a romantic relationship differs greatly, both within a couple and between couples. Jerabek (2003) has demonstrated that the degree to which a relationship meets the person’s need to communicate is strongly correlated with self-reported relationship satisfaction. The TCT takes this issue into account and matches partners based on their preferences for connectedness in a relationship.
Accordingly, the Communication Style factor
encompasses all the above-mentioned factors, including ideas about how much
communication is needed in a relationship, which issues are worth discussing,
and to what extent emotions should be shared.
Other factors that contribute to overall communication style include
level of comfort with displays of emotion, self-disclosure, need for intimacy,
comfort with expressing and witnessing emotions, and willingness to be
vulnerable in a romantic relationship. Also included are the need for
intellectual discussions, general communication skills, sensitivity and tact,
open-mindedness in terms of the communication process, and tolerance for
differences in opinion and goals (Meyer & Pilkonis, 2001; Rogge &
Bradbury, 1999; Simpson, 1990).
Conflict Resolution. The ways that individuals approach and resolve conflict situations can greatly influence their likelihood for establishing successful relationships. Conflict resolution skills are essential to overall relationship satisfaction, and working on developing skills in this area can produce a significant improvement in the couple’s happiness (Jerabek, 2003; Markman et al., 1993). By the same token, problematic conflict resolution style (competitive, dominating, passive-aggressive, withdrawal or submissive) in one or both partners leads to marital distress (Goeke-Morey et al., 2003; Gottman et al., 1998; Gottman & Krokoff, 1989; Kurdek, 1993, 1996; Kurdek & Schmitt, 1986).
Shackelford and Buss (1997) showed that when couples
experience conflict in a variety of areas (specifically affection and
attention, jealousy, finances, sex, chores and control and dominance), spouses
feel less esteem for one another. As
conflicts are inevitable in long-term relationships, the ability to
negotiate solutions to a variety of issues in a non-threatening way is
essential for the very survival of the romantic bond (Bradbury, 1998).
Moreover, parental conflict has a negative impact on children’s
adjustment and can lead to their maladaptive behaviors (Cummings & Davies,
1994; Grych et al., 2000), and this is especially true for girls (Davies &
Lindsay, 2004). There are many
different ways of dealing with conflict, some more productive than others –
whether or not respondents have what it takes to resolve conflict is essential
to consider. In addition, some
individuals are more likely to get into conflicts than others; hence, this is
important to consider when pairing subjects.
Therefore, in addition to respondents’ conflict style,
the TCT matching algorithms further take into account their proneness to
conflict and willingness to resolve it, while assessing the impact
of the interactions of these factors.
Sex Life. While sexual satisfaction is far from being the single most important factor in relationship satisfaction (according to a Queendom.com poll, only 35% of respondents claim that great sex life is absolutely necessary), sexual intimacy is important to a certain degree in most couples (Jerabek, 2003; Kelly & Conley, 1987; Perrone & Worthington, 2001). Intimacy brings partners closer, and allows them to express their love for one another. However, people have different attitudes, experiences, and preferences related to sex, all of which can contribute to discord if either party is unwilling or unable to compromise.
Sexual attitudes and behaviors are fundamental to both sexual compatibility and feeling fulfilled in this arena (Kelly & Conley, 1987). Since expectations and desires with respect to frequency of sexual encounters vary widely across all age groups, the test-taker’s libido is assessed and matched with potential partners. Moreover, attitudes about what is acceptable sexual behavior and willingness to experiment have been shown substantial variability throughout the life span, within age groups, in both genders and within all sexual orientation categories (Queendom.com, 1999, 2000, 2001).
Kelly and Conley (1987) reported that sexual history influenced marital outcome in their study. Terman (1938) and Burgess and Wallin (1953) have shown that high levels of premarital sexual activity are associated with marital instability in men. In addition, sexual faithfulness helps build a sense of trust between partners, and by the same token, unfaithfulness can significantly tarnish it.
To ensure compatibility in all these areas, the Sex
Life factor covers libido and desired frequency of sexual encounters,
faithfulness, sexual experience and history, sexual attitudes, and sexual
behavior.
Social Life. The amount of time a person likes to spend socializing, with whom they prefer to socialize, and their chosen social activities are all important when it comes to choosing a satisfactory partner (Asendorpf & Wilpers, 1998; Jerabek, 2003; Shackelford & Buss, 1997).
Social skills are important for relationship success, romantic and otherwise. Lack of people skills in one partner can lead to awkwardness and uncomfortable social situations. In fact, Shackelford and Buss (1997) demonstrated that both men and women who are married to disagreeable partners have less esteem for their partners. In more general terms, studies have shown that agreeableness prevents conflict with opposite-sex peers (Asendorpf & Wilpers, 1998; Graziano et al., 1996). Similarly, negative and pessimistic attitudes and approach to life can be very taxing for couples (Gottman et al., 1998; Julien et al., 2003; Pasch et al., 1997; Schulz et al., 2004).
The amount of couple and individual social life outside of the relationship is an important consideration. Interpersonal differences in this area can be great, causing clashes between couples about how much time to spend on their own, by themselves, and socially with others. Disagreement about how often and with whom the partners should or should not socialize can lead to recurring conflicts, jealousy, resentment, pent-up frustration, and feelings of abandonment, rejection and injustice. Jerabek (2003) found a strong and positive correlation between satisfaction with couples’ social life and self-reported relationship satisfaction. Extroversion is an equally important consideration as this trait influences how much time one wants to spend in the company of others. An introvert’s need for time for oneself can clash with an extrovert’s need for company, leading to feelings of suffocation in one and rejection in the other. In addition, Asendorpf and Wilpers (1998) reported that sociability predicts falling in love, while shyness prevents it, mainly by limiting the shy person’s exposure to potential partners.
Social support from sources other than romantic partner is an important factor in a model predicting marital satisfaction (Perrone & Worthington, 2001). In addition, relying solely on one’s romantic partner for fulfillment of all emotional needs places a lot of pressure on this partner and can lead to unhealthy attachment and co-dependent behavior – all of which in turn result in relationship distress (Jerabek, 2000).
In sum, the Social Life factor includes factors such
as extroversion, social skills, agreeableness, positive attitude, sense of
humor, selflessness, support network of friends and family, and desire for
friendships outside of the relationship (both couple and individual).
Personal Characteristics. Questions about one’s identity, how one interacts with people, and the way of looking at the world are all important factors to take into consideration when it comes to finding a mate. A suitable partner will truly complement the other’s personality, and have a similar outlook on life. Couples with similar personalities tend to have more satisfying relationships (Robins et al., 2000). The personal characteristics factor measures a variety of personality factors, along with attitudes about a variety of issues.
Conscientiousness: The Conscientiousness subfactor includes the following subscales: adherence to routine, self-discipline, organization/planning, orderliness, and dependability / reliability. Conscientiousness plays out in every area of relationships. People who are conscientious are likely to be frustrated by a lower level of conscientiousness in their romantic partners. Also, high conscientiousness in men is related to increased esteem towards them by their wives (Shackelford & Buss, 1997).
Integrity: The integrity subfactor measures overall honesty of the subject, along with whether their behavior is trustworthy. It consists of dependability/reliability, loyalty, and honesty. Honesty and trustworthiness are two extremely important attributes that people desire in romantic partners (Fletcher et al., 1999). Establishing and maintaining trust is essential for continued commitment in a relationship (Wieselquist et al., 1999).
Adventurousness: Adventurousness is another trait that can have a negative impact on a mismatched couple. People scoring low on this trait are reluctant to try new things; they tend to be sedentary and prefer a routine. Highly adventurous people, on the other hand, are ready to take off at the drop of a hat, love to experience new things, thrive on change and resent routine. Research has shown that sharing novel and exciting activities prevents boredom and stagnancy in a relationship. Being adventurous together is a good thing for a relationship (Aron et al., 2000); however, pairing two people far apart on adventurousness can cause serious personality clashes in the relationship. The Adventurousness subfactor includes the following subscales: flexibility, open-mindedness, energy level and zest, spontaneity, and adherence to routine.
Rigidity. The rigidity construct is assessed by the following subscales: need for control, standards (other-oriented perfectionism), flexibility, open-mindedness, and adherence to routine. It was included because being able to adjust in order to achieve compromise with a partner is essential to relationships, and this ability is nearly absent in people high in rigidity. According to Weiselquist et al. (1999), pro-relationship acts, such as sacrificing needs and preferences and making accommodations for a partner, helps build relationship trust.
Dominance: The dominance subscale consists of assertiveness, tolerance for differences in opinion, tolerance for goal differences – support, and need for control. A number of studies have demonstrated that the complementarity theory holds for this factor - people who are generally dominant tend to get along better with people who are more submissive, and vice versa (Dryer & Horowitz, 1997). However, dominant / submissive couples are at risk of developing co-dependency problems and the submissive partners may tend to fail to achieve successful differentiation of their selves, which is fundamental to long-term intimacy (Bowen, 1978; Guerin et al., 1996; Kerr, 1985; Skowron, 2000; Titelman, 1998).
Attitudes and worldview: This complex subfactor includes assessment of a variety of issues that frequently cause problems in romantic relationships.
· Gender roles: The TCT includes an assessment of the test-taker’s beliefs and attitudes regarding this potentially explosive issue. Perceived equality between the partners factors into relationship satisfaction (Perrone & Worthington, 2001). One’s gender role attitudes are reflected in numerous areas, but all of them essentially boil down to power balance in the relationship, from decision making, child care, chores and errands to division of financial resources.
· Money attitudes: This subfactor measures the participant’s approach and attitude towards money. Arguments about whether to spend versus save money can be a big stumbling block. If one partner is unable to curb his or her spending habits, the other partner may be resentful. Importance of money is also included in this measure; how driven one is, how many hours one is willing to work, and how much one desires spending money on the trappings of wealth are all affected by the importance one places on money. Conflicts about money can be a major indicator of problems in marriage; couples that argue about money feel much less esteem for one another than couples that do not argue about money (Shackelford & Buss, 1997).
· Political and social attitudes: While standing on the opposite sides of left / right spectrum does not necessarily prevent partners from forming a successful bond, these attitudes, especially in their extreme form, and even more so if combined with intolerance, can cause major opinion clashes in the couple.
· Parenting style: Similar strategies and opinions about parenting are important when a couple decides to raise a child together. Parenting styles – measured by attitudes, approaches, amount of affection shown to the children and level of discipline used – are varied, and must be taken into consideration. When it comes to parenting, being a team when it comes to parenting is imperative, as inconsistencies will typically lead to adjustment problems and maladaptive behavior in children (Brody et al. 2003; Kim et al., 2003; Ruiz et al., 2002). Style differences can also cause difficulties in marital relations, specifically in terms of couple intimacy (O’Brien & Peyton, 2002).
· Relationship attitudes and dating philosophy: There are some issues that can actually make or break a relationship, such as differing levels of readiness to commit, different relationship values, and lack of consensus about whether the relationship will end in marriage or is just a short fling. Each individual enters into relationships with their own preferences and expectations. If fundamental differences do exist, a couple may find that they have to either compromise on important issues or look elsewhere (Gray-Little et al., 1996).
· Romantic attitudes: While some might argue that being romantic, attentive or chivalrous have little to do with long-lasting relationship satisfaction, the fact is that these attitudes are not just skin-deep. Surely, romantic gestures and passionate attentiveness are more frequently observed in the early states of courtship. Nevertheless, there is no denying that remembering anniversaries, breakfast in bed and little affectionate gifts can contribute to a lasting romance, maintenance of passion and feeling of being loved, which in turn have a major impact on relationship satisfaction (Bradbury et al., 2000; Jerabek, 2003).
Attachment Style. Some people prefer complete independence from their partner, while others rely on their partner for almost everything, from self-worth to personal identity to decision making. There are those who need independence, and others who prefer to be attached at the hip most of the time. It is crucial to know what a potential partner’s relationship style is like from the very beginning. Discrepancies in terms of attachment needs can lead to disappointment and conflict in the relationship (Christensen & Shenk, 1991; Simpson, 1990).
One
aspect of a problematic attachment style is dependency, an inability to
differentiate one’s self from partner.
Differentiation of self
in intense emotional bonds is essential for development of a healthy
relationship. It allows for greater
role flexibility and deeper intimate contact.
Partners who maintain their sense of self can tolerate differences of
opinion and are less emotionally reactive (Bowen, 1978; Kerr & Bowen,
1988). Conversely, partners in poorly differentiated marriages are less
emotionally mature, have a limited capacity for closeness and separateness, and
tend to sacrifice self-development and sense of personal identity to maintain
stability in the relationship (Bowen, 1978; Kerr & Bowen, 1988; Schnarch,
1997).
The TCT also assesses other issues that are related to attachment style and dependency problems, such as security in a relationship and jealousy. Fear of rejection and abandonment are commonplace in couples with attachment problems. Studies on sensitivity to rejection demonstrate that people who anxiously expect rejection tend to readily perceive its presence in ambiguous or insensitive behavior of others (Downey & Feldman, 1996). Research by Downey and Feldman (1996) shows that rejection-sensitive people and their partners experience dissatisfaction with their relationship.
In addition, the TCT includes several subscales that assess need for personal space (i.e. having a life separate from partner), need for privacy (i.e. understanding and respecting each other’s domain), expectations in terms of the amount of social life of the couple (couple friendships) and need to maintain one’s individual friendships (socializing with others without the partner present). People with problematic attachment styles frequently hold dysfunctional relationship cognitions, which are linked to relationship dissatisfaction (Baucom & Epstein, 1990; Fincham et al., 1990; Kurdek, 1992). Unrealistic expectations and idealistic assumptions about how relationships should work set the stage for disappointment and a sense of failure when things don’t go as smoothly as one may wish. For example, people with dysfunctional relationship cognitions think that successful couples should never have any disagreements, should want to spend all their free time together or should never be attracted to another person. They feel that people who love each other should not have any secrets, should not need any personal space, should share anything and everything and should not need any friends other than their partner. In other words, they want to be “one body, one soul.” Once they realize that this is not the case in their relationshp, they may panic and overreact to minor problems.
Stress Reaction. A relationship has the potential to be a great source of support in stressful time – yet, for those people that deal poorly with stress the potential deterioration of the relationship can only add more stress. The relationships of couples that have less productive reactions to stressful life events may suffer when such events occur (Cohan & Bradbury, 1997). In addition, negative stressful events during workday contribute to angry marital behavior in women and withdrawal in men (Schulz et al., 2004) and in general to negative marital interactions (Crouter et al., 1989; Gottman & Levenson, 1988). According to Larson and Richards (1994), minor daily stressors, such as chores, childcare, and errands have a major effect on the emotional lives of the partners and the nature of family relationship. While some authors argue that gender differences are somewhat overrated when it comes to dealing with stress (Aries, 1996; Brody, 1999), several studies have demonstrated that men tend to use withdrawal (both emotional and behavioral) as a coping mechanism, women are more likely to be critical, verbally confront their partners, and initiate conflict (Christensen & Heavey, 1990; Gottman & Levenson, 1988). Brody (1999) argued that marriage is one context in which women are more likely to express more anger than men. These gender differences appear to be more pronounced under stress.
Since reaction to stress is highly subjective and depends on the person’s coping skills, sense of self-efficacy and ability to deal with adversity on an emotional level, the TCT includes number of factors assessing these characteristics. In a study by Waldinger et al. (2004), the ability to correctly read emotions was linked with concurrent marital satisfaction as well as interviewer’s assessment of long-term relationship stability and adjustment. It has also been demonstrated that emotional intelligence has a profound impact on one’s level of functioning, social success and happiness in general (Jerabek, 1999). Therefore, the TCT includes an assessment of emotional intelligence.
The Psychological Strength subfactor includes
several aspects that address these issues: security in a relationship,
dependency, need for control, self-esteem and self-confidence, mood stability
(including anxiety, depression, anger control and moodiness), optimism and
positive attitude. This construct is
most similar to the “big five” notion of neuroticism. There is a consensus in the literature that neuroticism is a
negative predictor of marital satisfaction (Kelly & Conley, 1987;
Shackelford & Buss, 1997).
There is no doubt that dealing with unstable emotions
in a partner is difficult, often leading to marital problems. However, the relationship between depression
and marital distress is bi-directional.
For example, 50% of women who are experiencing relationship problems
report significant depressive symptoms (Weissman, 1987). In fact, studies have shown that emotional
distance and alienation predict depression for both sexes (Cano & O’Leary,
2000; Heim & Snyder, 1991). In
addition, neuroticism in one of the partners has been shown to be one of the
best predictors of marital distress and dissolution of the couple (Kurdek,
1997).
Negative emotional behavior (e.g., expressed anger,
sadness, or contempt) has also been shown to differentiate satisfied from
dissatisfied couples (Schaap et al., 1988).
Likwise, self-esteem has been shown to be a good predictor of
relationship satisfaction, especially in men (Bailey et al., 1987; Hendrick
& Hendrick, 1988).
Basic
Measurement Issues: Technical Quality of the TCT
Rasch Scaling
The Rasch model is increasingly used for other purposes as
well (for an overview, see, e.g., Bond & Fox, 2001), including applications
in clinical psychology (McCutcheon et al., 2002; Lange, Thalbourne et al.,
2000), psychiatry (Lange, Greyson et al., 2004; Lange, Thalbourne et al.,
2002), medicine (Lange, Donathan et al., 2002; Lange & Hughes, 2004), and
artificial intelligence (Lange, Greiff et al., 2004).
The major differences between successful Rasch modeling and the classical scaling approaches can be summarized by four “rules” (Embretson, 1999, p. 12, cf., Embretson, 1995):
1. The standard error of measurement differs between persons with different response patterns but generalizes across populations.
2. Shorter tests can be more reliable than longer tests.
3. Comparing tests forms across multiple forms is optimal when test difficulty levels vary across persons.
4. Unbiased estimates of item properties may be obtained from unrepresentative samples.
In other
words, the classical notion that all test scores are equally reliable is
abandoned in favor of local (i.e., level-specific) standard errors of estimate
(SE) – no longer is there a single index of score reliability. Also, longer
tests are not necessarily “better,” as – depending on the distribution of
respondents’ trait levels – many questions are guaranteed to be redundant.
Thus, by using items that best address respondents’ different trait levels
(i.e., by purposely using non-parallel forms) greater measurement
precision is obtained. In the extreme, items can be selected specifically
to optimize reliability (or, equivalently, minimize SE). When this is done in
an interactive computerized fashion, one speaks of Computer Adaptive Testing,
or CAT (see e.g., Wainer, 2000).
Basics. The Rasch scaling of binary (i.e., dichotomous) items assumes the form of a logistic regression model where each person and item is individually parameterized to derive the log odds of the probability P of observing an answer indicative of the trait under consideration. For binary items (i) and persons (n):
. (1)
In the above, Pni reflects the
probability that person n will answer item i affirmatively, where
person n has trait level Bn and item i reflects
the trait amount Di. Note that the item and person parameters
share a common metric as defined by the left-hand side of Equation 1 – i.e.,
the log-odds of the probability Pni. Accordingly, all
quantities in the Rasch model are said to be expressed in logits.
Equation 1 shows that the Rasch model is additive in the
parameters (Bn) and (-Di). Thus, in
contrast to related models such as the two- and three-parameter logistic (cf.,
Fisher, 1995), the Rasch model meets the first requirement for interval
measurement – i.e., additivity (Michel, 1990).[5]
It follows from Fisher's principle of statistical sufficiency (see Wright &
Stone, 1979) that the maximum-likelihood of the parameter estimate for each
parameter occurs when the expected raw score corresponding to the parameter
estimate equals the observed raw score. Accordingly, raw scores are sufficient
statistics for the parameters B and D – indeed, these quantities
can be estimated independently of each other.
Rating Scales. The Rasch model has been extended to rating scales (Andrich, 1978) and partial-credit observations (Masters, 1982) for polytomous items, i.e., response formats where respondents select from two or more presumably ordered response categories. The rating scale and partial credit formulations both introduce “step” values {Fk} representing the boundaries between two adjacent rating categories k and k-1. To be precise, each Fk reflects the point at which the choices of categories k and k-1 are modeled to occur with equally probability. However, they differ with respect to the assumptions made concerning the item-dependency of the step values. In particular, the rating scale model assumes that the {Fk} are the same for all items under consideration, whereas the partial credit model allows the {Fk} to vary across items (see, e.g., Wright & Masters, 1982).
The TCT as described in the online Manual uses a hybrid of these two formulations that allows items’ step values to vary across different sub-groups (g) of items. In other words, it is assumed that items share the same step values within a particular sub-group, but these values are allowed to differ from the step values for other item sets. Accordingly, the group specific step values will be denoted as {Fgk}. Like the item and person parameters, the step values are additive, thus yielding the hybrid model:
(2)
In the above:
·
Pnik is the probability of observing category k for
person n encountering item i.
·
Pni(k-1) is the probability
of observing category k-1
·
Fgk is the difficulty of being observed in category k
relative to category k-1, for an item in group g.
Solving for Pnik in Equation 2 (not shown, see, e.g., Wright & Masters, 1982) yields an explicit equation whose plot serves to illustrate the quantities defined above. For instance, Figure 1 above shows the Pnik (Y-axis) for –5 < Bn < 8, with Di = 1 and Fg1 = -2, Fg2 = -1, and Fg3 = 3. In this figure, Fg1 and Fg2 are shown at –1 and 0, respectively, as their values (i.e., -2, and –1) are relative to the item’s overall location (B, X-axis) – in this case, 1. Going from left to right, the curves in this figure reflect the probability of observing a particular rating 0, 1, 2, 3, given B. The reader can verify that the {Fgk} are located at the point where the probability of finding a response in two adjacent categories is identical (i.e., at the intersections of the curves). Thus, the {Fgk} reflect the categories’ interior boundaries.
It is noted that the rating-scale and the
partial-credit formulations are both special cases of Equation 2. The former
obtains when all items are in the same group, and the latter obtains when each
item defines its own separate group. Also, Equation 1 for binary items obtains
when rating scales with just two categories are used.
Further Generalization. Linacre (1989) generalized Equations 1 and 2 to a Many-Facet Rasch Model by allowing the left-hand side to be affected by the effects of additional independent variables (or, “Facets”) as well. In the simplest case respondents’ trait levels can be thought of as being affected by a single variable C (e.g., respondents’ age or gender) with levels j:
(3)
Note
that the additive properties of the model are maintained. In particular, raw
scores are sufficient statistics for the Bn, Di
and Cj, and the counts of observations in each category are
sufficient statistics for estimating the {Fk}.
Model Fit. All Rasch formulations support powerful quality-control fit
statistics for assessing the conformance of the data to the model (see Wright
& Stone, 1979). Practice indicates that the model is robust against many
forms of misfit, and typical perturbations in data tend to have little
influence on the measure estimates. Thus, while a few misfitting items may
introduce noise, the quality of measurement provided by the other items is
thereby little affected. A further feature of the data is its robustness
against missing data. Since the model is parameterized at the individual
observation level, estimates are obtained only from the data that has been
observed (assuming that “missing” is not in fact a response option). There is
no need to impute missing data, or to assume a particular form of the
distribution of parameters. Of course, missing data decrease the precision with
which parameters can be estimated.
In estimating the measures, the model acts as though the randomness in the data
is well behaved in accordance to the particular Rasch model being used. This is
not a blind assumption, however, because the quality control fit statistics can
be computed to report where, and to what extent, this requirement has not been
exactly met. For instance, for each response to item i by person n,
a standardized residual zni can be computed as the
difference between an observed datum and the probability estimate P of
its occurrence (e.g., as derived via Equations 1, 2, or 3) after
division by its standard deviation. Since such zs are
approximately normally distributed, unexpected results (e.g., observations with
|z| > 3) are easily identified.
The preceding forms the basis for computing the overall fit of the questions across respondents as quantified by their Outfit. For instance, the Outfit of item i over respondents n is:
(4)
Since the summed z2 in Equation 4 define an approximate χ2 statistic with expected value n – 1, the Outfit statistic ranges from 0 to ∞, with an expected value of 1. Additionally, items’ Infit can be computed by weighting the terms in Equation 4 by the difference between the item and person locations (see Wright & Masters, 1982). Thus, the items’ Outfit is sensitive to deviations across the entire range of the latent Rasch variable, whereas their Infit mainly reflects localized inconsistencies.
Although the ideal Infit and Outfit values are 1, consistent with prevailing practice (see e.g., Bond, & Fox, 2001) values in the range 0.6 to 1.4 will be considered acceptable. Note that fit values exceeding 1 indicate the presence of unmodeled variation (i.e., the data are too noisy), whereas values smaller than 1 reflect the absence of modeled noise (i.e., the responses show greater determinism than is entailed by the model). The former is a more serious threat to model fit than the latter.
Differential Item and Test Functioning. Embretson’s (1999, p. 12, emphasis added) statement that “unbiased estimates of item properties may be obtained from unrepresentative samples” implies that items locations B should be invariant across sub-populations of the respondents. Recomputing the item locations Bi in samples from this population can check this assumption. When such checks reveal that the items’ locations systematically differ across sub-groups, we say that these items show Differential Item Function, or DIF. In the present context, age and gender are of particular interest because Lange, Houran et al. (2004) found that these variables yielded statistically significant DIF effects in a relationship-related context.
The finding of DIF
threatens construct validity since this implies that different sub-groups
assign different semantics to the underlying variable (for a discussion, see
Lange et al., 2001). The presence of DIF does not imply however that the
measurement of the latent variable is thereby seriously compromised – i.e.,
there need not be Differential Test Functioning (DTF). In particular, DIF
in some items may cancel that in others, thereby having little or no effect on
the estimated person parameters (for examples see e.g., Lange, Irwin et al.,
2000; McCutcheon et al., 2002). Unfortunately, DIF cancellation – and
hence the absence of DTF - is by no means guaranteed (cf., Lange,
Thalbourne et al., 2002; Lange, Houran et al., 2004) and should thus not be
taken for granted.
An effective means to
establish the absence of DTF is to determine whether the raw-score to
Rasch (R-to-R) measure conversions differ by more than these measures
standard errors of measurement. In the present research this is done
graphically by (a) plotting the R-to-R translation, together with the
local SEB (i.e., B + SEB) and
then (b) checking whether the sub-group specific R-to-R fall inside this
interval, except perhaps for the most extreme measures.[6]
If so, it has been established that sub-groups’ estimated measures show no
meaningful variation.
In the present context, we
focus on DTF related to respondents’ age and their own versus preferred
partner gender.
·
Age DTF is
assessed by comparing the R-to-R transformation for younger (age < 35
years) vs. older (age > 35 years) respondents.[7]
·
All four own vs. preferred
partner gender groups are considered. Thus, denoting male as M and
female as F, the four gender preference groups are FF, FM, MM, MF.
Parameter
Estimation. The parameters of the Rasch
models used here will be estimated using the versatile Winsteps software
(Linacre, 2004). This produce estimates of all model parameters in Equations 1
through 3 Joint Maximum Likelihood Estimation (JMLE) procedures. These
procedures are sufficiently efficient to analyze thousands of respondents and
items simultaneously, while allowing group-specific rating scale
parameterizations of the items. Winsteps also computes the item-total
correlations and the frequency of the ratings obtained for each item, as well
as the Infit and Outfit statistics discussed above.
Dimensionality.
A basic assumption underlying all of the preceding is that the items under
consideration define a single latent dimension. Unfortunately, it has long been
known (cf. Comrey, 1978; Panter et al., 1997) that standard item-level
factor analysis is inconclusive to establish unidimensionality (or
multi-dimensionality, for that matter).[8]
To make matters worse, it can be derived from statistical theory (cf. Stout,
1987, 2002) how multidimensionality may result from DIF – a finding that was
confirmed by computer simulations (Lange, Irwin et al., 2000).
The approach followed here to investigate items’
dimensionality is to analyze their residuals (see Equation 4) using
principal-component analysis because this addresses multidimensionality and DIF
simultaneously (cf. Linacre, 2004). The Winsteps software referred to above
incorporates such factor analyses as well.
Reliability. Within classical test theory "The reliability of any set of measurements is logically defined as the proportion of their variance that is true variance... We think of the total variance of a set of measures as being made up of two kinds of variance: true variance and error variance... The true measure is assumed to be the genuine value of whatever is being measured" (Guilford, 1965, p. 488). In other words,
(5)
Thus, reliability (as embodied for instance in the KR-20 or coefficient alpha) is not an index of quality of the instrument over which it is computed, but this index rather quantifies the extent to which scores can be reproduced. The major problem with the preceding definition is that it:
However, by explicitly modeling the stochastic nature of
each data point Xni Rasch scaling can identify
the source of the error variance. For instance, for the binary case,
(6)
The
error variance of Rasch measures can thus be estimated by taking into account
the sum of the modeled variance of observations. Of course, this
"model" error variance requires the data to conform stochastically to
the Rasch model. Since there is always additional noise in the data,
simulations (Linacre, 1997) indicate that a more appropriate estimate of the
“real” error variance is:
“Real”
error variance = model variance * MAX(1.0, Infit mean-square) (7)
Accordingly, Rasch reliability indices tend to be lower than KR-20 and coefficient alpha. Equation 7 further implies that these indices always exceed the maximum reliability, thus indicating that a test has better measurement characteristics than it actually has. To be sure, KR-20 and coefficient alpha accurately reflect the reliability of raw scores. However, raw scores are not trait measures, but rather local, test-dependent rankings, and generalizing raw scores to test-independent, generalizable measures is simply not justified. This difference is increasingly recognized, and AERA/APA/NCME Standards recommends that the “error of measurement based on one approach should not be interpreted as interchangeable with another derived by a different technique” (Standard 2.5).
Item and Person Reliability. Although this is rarely done within the framework of classical test theory, the above applies equally to items and respondents. Thus, two types of reliability can be distinguished:
In addition to providing an
impression of the adequacy of the size of the calibration sample, the latter is
important in situations where items are selected based on their locations on
the latent Rasch dimension (e.g., in test equating and computer adaptive
testing).
Item and Person Separation. While reliability indices are widely used, their interpretation is hindered by the fact that reproducibility is not a direct function of their magnitude. For instance, the difference between the two reliability coefficients 0.55 and 0.65 is far less than that between 0.85 and 0.95. For this reason, in Rasch scaling contexts the item and person reliability coefficients (R) are often expressed as separability indices (G):
(8)
The separation index corresponds directly to the value of Equation 5 above, and thus G ranges from 0 to ¥. The advantage of using G rather than reliability indices is that they directly reflect the number of statistically different performance strata that the test can successfully identify within a particular sample. Thus, when G = 2.5 this indicates that the test succeeds in distinguishing at most = 2 different strata of individuals.
Fisher (1992) takes a
slightly more liberal approach by defining the number of Discernible Strata
as (4 G + 1) / 3. The rationale for this definition is that the
functional range of typical measures is around 4 True SD. In most cases,
it is reasonable to inflate this by 1 RMSE to allow for the error in the
observed measures. If we then define a
significant difference between two measures as requiring a difference of at
least three RMSE, then there are significantly
different levels in the functional measurement range.
Figure 2
Local SEB. It has long been known that “reliability depends
upon the population measured as well as the measuring instrument...[hence one]
should speak of the reliability of a certain instrument applied to a certain
population under certain conditions” (Guilford, 1965, p. 439). Regardless
whether reliability or separation indices are used, the fact remains that the
standard error in estimating respondents’ trait levels varies across the latent
dimension – hence, measurement reliability cannot be captured adequately by a
single index, not even within a single
sample. In this context we note that Standard 2.1 states that “For each
total score, subscore, or combination of scores that is to be interpreted,
estimates of relevant reliabilities and standard errors of test measurement or
test information should be reported.”
Taken literally, this means that such information should be available for
each possible TCI measure, and this is the approach taken here.
In this context we note
that according to the Rasch rating scale model[9]
the reliability with which a person’s trait level can be assessed varies
directly with the number of step values Fgk that lie near
this person’s location Bn on the latent dimension (cf.,
Wright & Masters, 1982). As measures become increasingly extreme, then the
density of the step values must eventually decrease. Hence, the standard error SEB
associated with extreme (i.e., relative to the available Fgk)
person measures Bn is greater than the SEB
for Bn closer to the bulk of the Fgk. This
fact is illustrated in Figure 2, which shows a plot of the SEB
against the person measures Bn derived from a “test” consisting
of seven hypothetical rating scale items (Note: Additional plots based on
actual data will be given in Section 4 below).
4. Scaling Results
Respondents. Respondents. The scaling of the TCTTM
reported here is based on the responses of 11,576 users of TRUE.com’s online dating service. This
sample comprised 5769 men and 5807 women with a mean age of 35.3 years (Median
= 34, Range = 17 to 84 years). The distribution of respondents’ sexual
preferences – as inferred from their own gender and the preferred gender of
their possible partners [both M(ale) vs. F(emale)] – was: MM = 213, MF
= 5556, FM = 5508, FF = 299. Regardless of their fit to the Rasch
model (or lack thereof), no respondents were excluded from the analyses. The
frequency of the responses to the 218 selected questions’ various options are
listed in Appendix B in the online Manual.
Item Fit. To obtain a baseline, a series of
analyses were performed to determine the items’ fit to Equation 3 shown earlier
in Section 3 by treating 218 active non-adaptive items as a single scale.
Similar analyses were then performed over the items in the seventeen most
important subscales. For reasons that were discussed in the introduction, we
identify the items as well as the subscales by numeric tags only.
Appendix C
shows the locations Dgi of all 218 items, together with the
standard errors of estimate SEDi, as well as these items’ Infit,
Outfit, and Item-Total correlations.[10]
Rather surprisingly, and indicative of low dimensionality, the fit of the items
to a single Rasch dimension is quite good. Except for one item (Item 65),
all Outfit values fall within the standard acceptable range (i.e., 0.6
< Outfit < 1.4). Also, just 5 of the 218 items show negative Item-Total
correlations. However, the results of a principal-component analysis of the
item’s residuals (not shown) revealed substantial loadings on the first
residual factor. Accordingly, it is meaningful to consider additional factors.
Subfactors. The seventeen factors studied next were
labeled as Factors 10, 18, 19, 29, 35, 42, 52, 71, 72, 73, 75, 76, 82, 84, 85,
88, and 90.[11] The results
of the Rasch analyses of these factors are reported in Tables 2 through 18
below. It can be observed that the items show excellent fit to the Rasch model,
as indicated by the acceptable Outfit values and positive Item-Total
correlations (with very few exceptions, as is indicated by boldface entries).
Accordingly, the internal structure of these factors supports the assumption
that the items indeed define a latent dimension in accordance with the scaling
assumptions of the Rasch model.
Table 2: Factor 10
Item |
Di |
SED |
Infit |
Outfit |
ritem-tot |
I0038 |
-0.93 |
0.01 |
0.92 |
0.92 |
0.36 |
I0072 |
-0.90 |
0.01 |
0.89 |
0.85 |
0.44 |
I0083 |
-0.81 |
0.01 |
0.91 |
0.88 |
0.43 |
I0077 |
-0.71 |
0.01 |
0.97 |
0.98 |
0.36 |
I0041 |
-0.60 |
0.01 |
0.88 |
0.86 |
0.46 |
I0058 |
-0.43 |
0.01 |
0.95 |
0.97 |
0.40 |
I0234 |
-0.16 |
0.02 |
1.02 |
0.99 |
0.14 |
I0201 |
0.06 |
0.01 |
0.96 |
0.96 |
0.49 |
I0170 |
0.53 |
0.01 |
1.02 |
1.51 |
0.29 |
I0090 |
0.62 |
0.01 |
1.14 |
1.16 |
0.30 |
I0049 |
1.01 |
0.01 |
1.16 |
1.33 |
0.26 |
I0048 |
2.31 |
0.02 |
1.10 |
1.24 |
0.00 |
Table 3: Factor 18
Item |
Di |
SED |
Infit |
Outfit |
ritem-tot |
I0180 |
-1.07 |
0.01 |
1.00 |
1.02 |
0.25 |
I0203 |
-1.03 |
0.01 |
1.00 |
1.01 |
0.25 |
I0043 |
-0.87 |
0.01 |
1.06 |
1.26 |
0.17 |
I0053 |
-0.82 |
0.01 |
0.97 |
1.01 |
0.30 |
I0054 |
-0.82 |
0.01 |
0.98 |
1.03 |
0.29 |
I0130 |
-0.31 |
0.02 |
1.07 |
1.12 |
0.07 |
I0005 |
0.04 |
0.01 |
1.00 |
1.04 |
0.39 |
I0074 |
0.22 |
0.01 |
0.89 |
0.89 |
0.49 |
I0172 |
0.35 |
0.01 |
1.05 |
1.05 |
0.21 |
I0115 |
0.41 |
0.01 |
1.06 |
1.05 |
0.12 |
I0060 |
0.92 |
0.01 |
0.98 |
1.02 |
0.41 |
I0163 |
0.93 |
0.01 |
0.92 |
0.94 |
0.45 |
I0113 |
0.97 |
0.01 |
0.96 |
0.99 |
0.41 |
I0096 |
1.07 |
0.01 |
0.99 |
0.99 |
0.37 |
Table 4: Factor 19
Item |
Di |
SED |
Infit |
Outfit |
ritem-tot |
I0053 |
-0.71 |
0.01 |
0.92 |
0.89 |
0.45 |
I0054 |
-0.71 |
0.01 |
0.94 |
0.91 |
0.43 |
I0040 |
-0.70 |
0.01 |
0.81 |
0.76 |
0.54 |
I0006 |
-0.52 |
0.01 |
0.85 |
0.83 |
0.52 |
I0129 |
-0.41 |
0.01 |
0.89 |
0.88 |
0.50 |
I0058 |
-0.36 |
0.01 |
0.83 |
0.83 |
0.55 |
I0151 |
-0.01 |
0.01 |
0.93 |
0.93 |
0.52 |
I0193 |
0.89 |
0.01 |
1.16 |
1.20 |
0.09 |
I0163 |
1.23 |
0.01 |
1.25 |
1.48 |
0.23 |
I0113 |
1.29 |
0.01 |
1.30 |
1.72 |
0.17 |
Table 5: Factor 29
Item |
Di |
SED |
Infit |
Outfit |
ritem-tot |
I0138 |
-1.15 |
0.01 |
0.99 |
1.02 |
0.23 |
I0038 |
-1.13 |
0.01 |
1.00 |
1.08 |
0.21 |
I0147 |
-1.12 |
0.01 |
0.99 |
1.10 |
0.21 |
I0076 |
-1.01 |
0.01 |
1.02 |
1.07 |
0.22 |
I0088 |
-0.46 |
0.01 |
0.99 |
0.99 |
0.34 |
I0150 |
-0.39 |
0.01 |
1.07 |
1.11 |
0.19 |
I0201 |
-0.17 |
0.01 |
0.91 |
0.91 |
0.47 |
I0090 |
0.35 |
0.01 |
0.94 |
0.94 |
0.43 |
I0194 |
0.42 |
0.01 |
0.99 |
1.00 |
0.40 |
I0164 |
0.44 |
0.01 |
1.06 |
1.07 |
0.13 |
I0190 |
0.50 |
0.01 |
0.96 |
0.97 |
0.40 |
I0049 |
0.70 |
0.01 |
0.97 |
1.00 |
0.38 |
I0188 |
1.06 |
0.01 |
1.05 |
1.07 |
0.21 |
I0048 |
1.99 |
0.02 |
1.02 |
1.06 |
0.11 |
Table 6: Factor 35
Item |
Di |
SED |
Infit |
Outfit |
ritem-tot |
I0064 |
-1.57 |
0.01 |
1.18 |
1.22 |
0.36 |
I0185 |
-0.85 |
0.01 |
0.93 |
0.96 |
0.50 |
I0057 |
0.11 |
0.11 |
1.11 |
1.16 |
0.54 |
I0194 |
0.11 |
0.01 |
0.84 |
0.80 |
0.65 |
I0066 |
0.40 |
0.13 |
0.72 |
0.68 |
0.69 |
I0104 |
0.50 |
0.01 |
0.82 |
0.76 |
0.63 |
I0143 |
0.52 |
0.01 |
0.79 |
0.71 |
0.64 |
I0091 |
0.79 |
0.01 |
0.90 |
0.78 |
0.57 |
Table 7: Factor 42
Item |
Di |
SED |
Infit |
Outfit |
ritem-tot |
I0155 |
-0.76 |
0.01 |
0.99 |
1.00 |
0.43 |
I0020 |
-0.45 |
0.01 |
0.99 |
0.99 |
0.49 |
I0199 |
-0.45 |
0.01 |
0.98 |
0.98 |
0.50 |
I0102 |
-0.27 |
0.01 |
0.92 |
0.93 |
0.53 |
I0080 |
-0.01 |
0.01 |
1.00 |
1.01 |
0.50 |
I0193 |
0.38 |
0.01 |
1.15 |
1.19 |
0.16 |
I0163 |
0.75 |
0.01 |
0.94 |
0.93 |
0.51 |
I0113 |
0.81 |
0.01 |
0.97 |
0.95 |
0.49 |
Table 8: Factor 52
Item |
Di |
SED |
Infit |
Outfit |
ritem-tot |
I0192 |
-0.82 |
0.02 |
0.96 |
0.96 |
0.30 |
I0039 |
-0.54 |
0.01 |
0.98 |
0.98 |
0.39 |
I0144 |
-0.35 |
0.01 |
0.96 |
0.96 |
0.34 |
I0155 |
-0.26 |
0.01 |
0.91 |
0.89 |
0.47 |
I0199 |
0.06 |
0.01 |
0.89 |
0.88 |
0.52 |
I0240 |
0.39 |
0.01 |
1.06 |
1.07 |
0.38 |
I0035 |
0.71 |
0.01 |
1.09 |
1.11 |
0.21 |
I0070 |
0.82 |
0.01 |
1.15 |
1.19 |
0.27 |
Table 9: Factor 71
Item |
Di |
SED |
Infit |
Outfit |
ritem-tot |
I0162 |
-0.90 |
0.01 |
1.10 |
1.40 |
0.26 |
I0197 |
-0.72 |
0.01 |
1.01 |
0.99 |
0.25 |
I0119 |
-0.27 |
0.02 |
1.03 |
0.99 |
0.21 |
I0169 |
-0.15 |
0.10 |
1.11 |
1.07 |
0.46 |
I0135 |
0.23 |
0.01 |
0.83 |
0.82 |
0.58 |
I0196 |
0.24 |
0.02 |
1.02 |
1.08 |
0.13 |
I0105 |
0.25 |
0.01 |
1.01 |
1.04 |
0.44 |
I0002 |
0.63 |
0.01 |
1.04 |
1.12 |
0.39 |
I0136 |
0.69 |
0.01 |
0.87 |
0.86 |
0.52 |
Table 10: Factor 72
Item |
Di |
SED |
Infit |
Outfit |
ritem-tot |
I0138 |
-1.09 |
0.01 |
0.97 |
0.97 |
0.25 |
I0038 |
-1.06 |
0.01 |
0.98 |
1.00 |
0.24 |
I0076 |
-0.95 |
0.01 |
0.99 |
1.01 |
0.25 |
I0237 |
-0.57 |
0.01 |
0.99 |
1.18 |
0.33 |
I0088 |
-0.41 |
0.01 |
1.00 |
0.99 |
0.32 |
I0110 |
-0.16 |
0.01 |
0.97 |
0.98 |
0.37 |
I0228 |
-0.14 |
0.02 |
0.99 |
1.00 |
0.15 |
I0238 |
0.23 |
0.01 |
1.00 |
1.08 |
0.19 |
I0170 |
0.36 |
0.01 |
0.98 |
1.40 |
0.26 |
I0164 |
0.47 |
0.01 |
1.00 |
1.00 |
0.21 |
I0049 |
0.72 |
0.01 |
1.00 |
1.06 |
0.31 |
I0163 |
0.77 |
0.01 |
0.97 |
1.01 |
0.35 |
I0136 |
0.92 |
0.01 |
0.99 |
1.12 |
0.30 |
I0152 |
0.92 |
0.01 |
0.99 |
1.15 |
0.30 |
Table 11: Factor 73
Item |
Di |
SED |
Infit |
Outfit |
ritem-tot |
I0050 |
-0.74 |
0.01 |
1.02 |
1.01 |
0.27 |
I0141 |
-0.40 |
0.01 |
1.02 |
1.05 |
0.34 |
I0187 |
-0.33 |
0.01 |
1.00 |
1.04 |
0.44 |
I0098 |
-0.18 |
0.01 |
0.98 |
1.00 |
0.40 |
I0185 |
-0.17 |
0.01 |
1.01 |
1.01 |
0.33 |
I0184 |
0.03 |
0.01 |
1.10 |
1.10 |
0.09 |
I0055 |
0.51 |
0.01 |
0.86 |
0.86 |
0.55 |
I0103 |
0.56 |
0.01 |
0.87 |
0.87 |
0.53 |
I0084 |
0.71 |
0.01 |
1.14 |
1.13 |
0.03 |
Table 12: Factor 75
Item |
Di |
SED |
Infit |
Outfit |
ritem-tot |
I0264 |
-1.46 |
0.02 |
1.09 |
1.27 |
0.36 |
I0263 |
-1.42 |
0.02 |
1.03 |
1.17 |
0.39 |
I0262 |
-1.11 |
0.01 |
0.81 |
0.73 |
0.57 |
I0261 |
-0.72 |
0.01 |
0.74 |
0.73 |
0.65 |
I0259 |
-0.32 |
0.01 |
0.64 |
0.62 |
0.74 |
I0260 |
-0.24 |
0.01 |
0.72 |
0.72 |
0.71 |
I0257 |
0.04 |
0.01 |
0.69 |
0.67 |
0.75 |
I0256 |
0.06 |
0.01 |
0.73 |
0.72 |
0.73 |
I0238 |
0.16 |
0.01 |
1.84 |
0.96 |
0.69 |
I0059 |
0.55 |
0.01 |
1.09 |
1.18 |
0.61 |
I0153 |
0.71 |
0.01 |
1.38 |
1.75 |
0.51 |
I0209 |
0.99 |
0.01 |
0.89 |
0.90 |
0.67 |
I0226 |
1.35 |
0.01 |
1.20 |
1.27 |
0.42 |
I0258 |
1.44 |
0.01 |
1.19 |
1.28 |
0.53 |
Table 13: Factor 76
Item |
Di |
SED |
Infit |
Outfit |
ritem-tot |
I0180 |
-1.59 |
0.01 |
1.17 |
1.25 |
0.00 |
I0043 |
-1.39 |
0.01 |
1.23 |
1.57 |
-0.04 |
I0054 |
-1.35 |
0.01 |
1.22 |
1.46 |
0.00 |
I0187 |
-0.89 |
0.01 |
1.34 |
2.04 |
0.13 |
I0074 |
-0.30 |
0.01 |
0.91 |
0.91 |
0.48 |
I0080 |
-0.25 |
0.01 |
1.02 |
1.03 |
0.38 |
I0081 |
-0.23 |
0.01 |
0.88 |
0.89 |
0.50 |
I0059 |
-0.08 |
0.01 |
0.88 |
0.88 |
0.52 |
I0153 |
0.02 |
0.01 |
0.88 |
0.87 |
0.53 |
I0060 |
0.41 |
0.01 |
0.95 |
0.93 |
0.44 |
I0163 |
0.41 |
0.01 |
0.85 |
0.82 |
0.54 |
I0113 |
0.46 |
0.01 |
0.84 |
0.80 |
0.54 |
I0096 |
0.56 |
0.01 |
0.90 |
0.87 |
0.48 |
I0152 |
0.57 |
0.01 |
1.00 |
1.01 |
0.39 |
I0220 |
0.80 |
0.02 |
0.99 |
0.97 |
0.27 |
I0094 |
1.21 |
0.02 |
1.02 |
1.06 |
0.09 |
I0048 |
1.65 |
0.02 |
0.96 |
0.92 |
0.29 |
Table 14: Factor 82
Item |
Di |
SED |
Infit |
Outfit |
ritem-tot |
I0204 |
-1.83 |
0.04 |
1.00 |
0.99 |
0.14 |
I0041 |
-0.35 |
0.01 |
0.95 |
0.94 |
0.44 |
I0019 |
-0.11 |
0.01 |
0.87 |
0.83 |
0.53 |
I0125 |
-0.02 |
0.01 |
1.04 |
1.05 |
0.31 |
I0234 |
0.12 |
0.02 |
1.01 |
0.96 |
0.20 |
I0202 |
0.25 |
0.01 |
0.88 |
0.88 |
0.57 |
I0205 |
0.79 |
0.01 |
1.05 |
1.06 |
0.27 |
I0190 |
1.15 |
0.01 |
1.16 |
1.24 |
0.35 |
Table 15: Factor 84
Item |
Di |
SED |
Infit |
Outfit |
ritem-tot |
I0204 |
-1.59 |
0.04 |
1.00 |
1.00 |
0.15 |
I0078 |
-0.79 |
0.02 |
0.82 |
0.67 |
0.50 |
I0203 |
-0.51 |
0.01 |
0.89 |
0.88 |
0.47 |
I0189 |
-0.46 |
0.01 |
0.93 |
0.89 |
0.43 |
I0044 |
-0.39 |
0.01 |
0.87 |
0.82 |
0.50 |
I0114 |
-0.24 |
0.01 |
0.87 |
0.83 |
0.52 |
I0232 |
-0.23 |
0.01 |
0.88 |
0.83 |
0.50 |
I0041 |
-0.12 |
0.01 |
0.93 |
0.93 |
0.49 |
I0231 |
-0.07 |
0.01 |
0.87 |
0.85 |
0.54 |
I0227 |
0.09 |
0.01 |
0.90 |
0.89 |
0.51 |
I0019 |
0.12 |
0.01 |
1.07 |
1.07 |
0.42 |
I0125 |
0.21 |
0.01 |
1.08 |
1.09 |
0.26 |
I0205 |
1.03 |
0.01 |
1.15 |
1.18 |
0.15 |
I0190 |
1.39 |
0.01 |
1.46 |
1.67 |
0.11 |
I0015 |
1.55 |
0.02 |
1.15 |
1.20 |
0.09 |
Table 16: Factor 85
Item |
Di |
SED |
Infit |
Outfit |
ritem-tot |
I0040 |
-0.55 |
0.01 |
0.83 |
0.75 |
0.53 |
I0064 |
-0.55 |
0.01 |
0.92 |
0.88 |
0.40 |
I0077 |
-0.52 |
0.01 |
0.87 |
0.83 |
0.48 |
I0039 |
-0.43 |
0.01 |
0.93 |
0.92 |
0.41 |
I0006 |
-0.39 |
0.01 |
0.85 |
0.81 |
0.51 |
I0129 |
-0.29 |
0.01 |
0.88 |
0.85 |
0.49 |
I0237 |
-0.24 |
0.01 |
1.08 |
1.11 |
0.36 |
I0160 |
-0.12 |
0.01 |
1.03 |
1.04 |
0.24 |
I0088 |
-0.06 |
0.01 |
1.04 |
1.04 |
0.33 |
I0151 |
0.07 |
0.01 |
0.93 |
0.94 |
0.45 |
I0222 |
0.33 |
0.01 |
1.14 |
1.34 |
0.23 |
I0075 |
0.56 |
0.01 |
1.32 |
1.38 |
0.09 |
I0238 |
0.59 |
0.01 |
0.99 |
0.99 |
0.26 |
I0069 |
1.61 |
0.01 |
1.17 |
1.33 |
0.00 |
Table 17: Factor 88
Item |
Di |
SED |
Infit |
Outfit |
ritem-tot |
I0204 |
-2.12 |
0.04 |
1.00 |
1.01 |
0.10 |
I0232 |
-0.72 |
0.01 |
1.03 |
1.10 |
0.27 |
I0001 |
-0.41 |
0.01 |
0.99 |
1.00 |
0.36 |
I0019 |
-0.39 |
0.01 |
0.99 |
1.01 |
0.37 |
I0205 |
0.43 |
0.01 |
1.02 |
1.02 |
0.23 |
I0153 |
0.62 |
0.01 |
0.98 |
1.00 |
0.45 |
I0194 |
0.68 |
0.01 |
0.94 |
0.94 |
0.48 |
I0212 |
0.86 |
0.01 |
1.02 |
1.03 |
0.41 |
I0113 |
1.06 |
0.01 |
1.00 |
1.04 |
0.38 |
Table 18: Factor 90
Item |
Di |
SED |
Infit |
Outfit |
ritem-tot |
I0117 |
-1.15 |
0.01 |
0.99 |
1.25 |
0.32 |
I0118 |
-0.83 |
0.02 |
0.92 |
0.91 |
0.40 |
I0187 |
-0.65 |
0.01 |
0.86 |
0.85 |
0.55 |
I0197 |
-0.57 |
0.01 |
1.04 |
1.05 |
0.19 |
I0119 |
-0.10 |
0.02 |
0.90 |
0.81 |
0.42 |
I0196 |
0.40 |
0.02 |
0.95 |
0.91 |
0.29 |
I0120 |
0.41 |
0.02 |
0.96 |
0.90 |
0.28 |
I0105 |
0.45 |
0.01 |
1.00 |
1.06 |
0.44 |
I0002 |
0.80 |
0.01 |
1.23 |
2.49 |
0.10 |
I0094 |
1.49 |
0.02 |
1.02 |
1.02 |
0.15 |
Figure 3
Respondents’ measures on
each of the 17 subscales were subjected to standard factor analysis. Consistent
with the analysis of items’ residuals, the plot of the magnitudes of the
components’ eigenvalues shown in Figure 3 shows clear evidence of
multi-dimensionality as several eigenvalues exceed 1.
Differential Test
Functioning. To assess whether the
seventeen factors suffer from DTF related to respondents’ sexual
orientation and age, separate raw sum to Rasch transformations were computed
for the four MM, MF, FF, and FM groups, as well as for Younger and
Older respondents. The results are shown pair wise in Figures 4 through 37
(i.e., the sexual orientation and age graphs for each factor are shown on a
single page). The error bands (+ 1 SEB) are relative
to the R-to-R derived for all respondents combined.
As is indicated by an
asterisk (*) in the titles of the graphs below, sexual orientation DTF
could not fully be assessed for four of the factors given the relatively small
numbers of MM and FF respondents.[12]
However, it is clear that the R-to-R values for the available subgroups
all fall inside the error band of the total group R-to-R for each of the
factors. In other words, there is no evidence that the Rasch measures show
systematic bias due to respondents’ age or sexual orientation.
Reliability. Given the absence of DTF, the error bands in
Figures 4 through 37 may be assumed to give accurate estimates of the local SEB
for each of the factors. Note that, for reasons discussed in Section 3, the
size of the error bands B + SEB increases for more extreme
person measures and raw sums.
In addition, for each of
the seventeen factors Table 19 lists the Rasch reliability indices as well as
the separation values (G) for the items as well as the respondents. As
was noted earlier, the person reliability corresponds most closely to the
reliability estimate provided by Coefficient
Alpha (or KR-20) within
the framework of classical test theory. However, due to the more realistic
error assumptions being made, the value of the Rasch reliability coefficients
tends to be lower than overall reliability estimates obtained within the
classical framework.
Table 19: Item and person Reliability and Separation (G)
estimates
|
|
|
|
|
|
|
Persons |
|
Items |
||
Item Set |
Separation |
Reliability |
|
Separation |
Reliability |
All Items |
2.25 |
0.84 |
|
27.75 |
1.00 |
Factor 10 |
0.84 |
0.41 |
|
75.79 |
1.00 |
Factor 18 |
0.80 |
0.39 |
|
69.52 |
1.00 |
Factor 19 |
1.06 |
0.53 |
|
74.71 |
1.00 |
Factor 29 |
0.76 |
0.37 |
|
80.37 |
1.00 |
Factor 35 |
1.01 |
0.50 |
|
12.29 |
0.99 |
Factor 42 |
1.11 |
0.55 |
|
53.06 |
1.00 |
Factor 47 |
0.51 |
0.21 |
|
51.82 |
1.00 |
Factor 71 |
0.51 |
0.21 |
|
14.40 |
1.00 |
Factor 72 |
0.52 |
0.21 |
|
64.44 |
1.00 |
Factor 73 |
0.67 |
0.31 |
|
47.95 |
1.00 |
Factor 75 |
2.40 |
0.85 |
|
80.82 |
1.00 |
Factor 76 |
1.22 |
0.60 |
|
76.64 |
1.00 |
Factor 82 |
0.66 |
0.31 |
|
45.47 |
1.00 |
Factor 84 |
1.13 |
0.53 |
|
50.45 |
1.00 |
Factor 85 |
0.85 |
0.42 |
|
61.15 |
1.00 |
Factor 88 |
0.67 |
0.31 |
|
58.44 |
1.00 |
Factor 90 |
0.66 |
0.31 |
|
48.65 |
1.00 |
Note that the Rasch
reliability coefficients of the person measures produced by some of the
subscales are rather small (e.g., Factors 47, 71, 72, and 73). It should be
kept in mind, however, that these subscales are not used in isolation, but
rather that these contribute jointly to the matching process. Also,
within the TCT matching algorithm, less reliable subscales do not
form the decisive piece of information to match two individuals.
Not surprisingly given the
sample size, the reliability of the item locations is considerable. Thus, their
locations are known with high precision, thereby providing a sound basis for
future expansion of the TCT.
Preliminary
Validity Data for the TCT
Compatibility researchers (Fowers & Olson, 1986; Houran et al., 2004; Wilson & Cousins, 2002b) have conceded that longitudinal designs are needed to fully assess the efficacy of offline or online matchmaking tests. Yet, cross-sectional research on relationship satisfaction and stability can provide important preliminary data in support of a test’s validity.
To assess the
predictive validity of the TCT a separate study using 1101 married individuals,
including 112 couples was conducted (Lange Houran et al., 2004, Lange et al.,
submitted). This study is described in its entirety in Appendix D of the online
Manual. However, we summarize the
major findings below.
·
The Pearson correlation
between respondents’ satisfaction and compatibility measures is highly
significant (r = 0.26, p
< .001), thus supporting the notion that these variables form two
different factors. This correlation rises to 0.46 when corrected for
attenuation. Note that this effect size exceeds the validity findings reported
by Wilson and Cousins (2003) for the WRCI measure of compatibility. Thus, there
is solid evidence that couple similarity is associated with greater marital
satisfaction.
·
Yet, these two variables
should not be treated as identical because a two-dimensional Rasch model in
which the marital satisfaction and complementarity items are treated as
separate factors provides significantly better fit than does a one-dimensional
model comprising all items simultaneously (c23 = 1501.52, p <
.001).
·
Consistent with the
complementarity hypothesis, the level of satisfaction produces powerful
item-shifts (DIF). In
particular, those in the High satisfaction group disproportionately endorse
items that address couples’ division of responsibility, dealing with stress,
conflict resolution, and values as a couple. Of course, it is equivalent to say
that those Low in satisfaction are particularly dissatisfied with these issues. Surprisingly, the High group is
less satisfied with respect to value differences between partners than is
expected – and a large difference between this item’s location in the Low and
High groups is observed (1.22 Logits,
SEdif = 0.10, z = 11.61). Although direct evidence is
lacking, we hypothesize that this is because differences in personal values are
difficult to resolve by behavioral accommodation.
·
Extremely powerful
item-shifts (DIF) occur between
the Low and High satisfaction groups as defined earlier as the item locations
in these groups differ by at least 0.49 Logits
(all shifts are statistically significant, p < .01, 2-sided). The findings indicate that respondents’
patterns of complementarity ratings differ greatly with their marital
satisfaction. Specifically, those most satisfied in their marriages report the
lowest complementarity with their partners with respect to spending and saving
money. However, respondents in the Low satisfaction group report maximum lack
of complementarity with respect to sexual issues and parenting. In other words
– and perhaps not surprisingly so – dissatisfaction in marriage quickly
manifests itself as a lack of complementarity with respect to sexual and
parenting issues.
·
Location shifts in the
complementarity items due to differences in satisfaction are sufficiently
strong to distort measurement. In the present sample this means that over 40%
of the respondents would receive significantly different Rasch measures as
derived from the Low vs. High satisfaction translation functions.
Interestingly, the estimated person measures Rr for a raw sum score of 24 (1.11 Logits), derived for the Low
satisfaction groups exceeds the estimate (1.03 Logits) for a raw sum score of 26 as derived from the High
group’s data. Thus, the satisfaction related item-shifts are sufficiently large
to introduce measurement distortions at the ordinal level as well.
·
A multivariate analysis
of variance of the satisfaction and complementarity variables by Sex showed a
multivariate main effect (F2,1098
= 7.44, p < 0.001). Subsequent univariate analyses of
variance (ANOVA) indicated that Sex had a significant effect on satisfaction as
men expressed greater overall marital satisfaction than did women (MMen = 1.76 vs. MWomen
= 1.38 Logits, F1,1099 = 14.22, p < .001, MSe =
2.65 Logits). By contrast, women and men’s complementarity appears to be
highly similar (MMen = 1.35
vs. MWomen = 1.33 Logits, F1,1099 = 0.05, p >
0.50).
·
DIF effects may seriously affect latent variables’
quantitative properties. Moreover, statistical theory (Stout, 1987) and
computer simulations alike (Lange et al., 2000) indicate that item shifts may
create spurious factor analytic structures. Accordingly, it is no longer
obvious that widely cited results within the literature of assortative mating
(historical or current) should be accepted at face value. For example, it seems
likely that the notion of love consisting to varying extents of Romantic
Dependency, Communicative Intimacy, Physical Arousal, Respect, and Romantic
Compatibility (Critelli et al., 1986) might well vary depending on partners’
overall relationship satisfaction. Instead, consistent with Masuda’s (2003)
recent review, we suspect that Erotic and Companionate Love are the major
qualitative components, since these most resemble complementarity and
satisfaction, respectively. Finally, as is important for the increasing
popularity of online matchmaking businesses, we note the findings of
qualitative differences cast serious doubts on simple formulaic prescriptions
for romantic compatibility and relationship success (cf. Hoffman & Weiner,
2003; Wilson & Cousins, 2003b). Thus, TRUE’s decision to incorporate non-linear elements into its
matching algorithms is warranted and follows on the theoretical foundation established
by Gottman et al (2002).
Summary: Relation to
AERA/APA/NCME Test Construction Standards
Here we summarize how the evidence reported in the preceding sections pertains to the standards set forth in the 1999 Standards for Educational and Psychological Testing as issued jointly by the American Educational Research Association, the American Psychological Association, and the National Council on Measurement in Education (AERA, APA, & NCME, 2002).
The following summarizes the AERA / APA / NCME standards together with a summary of the evidence pertaining to each standard as described in this manual. The applicable Validity and Reliability Standards are shown separately in Tables 20 and 21, respectively.
Table 20: Applicable AERA / APA / NCME Validity Standards
Applicable Standard[13] |
Summary, or reference
to section within this manual |
1.1 Provide rationale for use
of test scores, interpretation, and theory. |
See Section 2, Section 5, and
Appendix C (online Manual). |
1.2 Define interpretation,
intended population and construct. |
Online daters, evident from
context. |
1.5 Sample composition for
validation |
Different by necessity. See
Section 5, Appendix C (online Manual).. |
1.6 Appropriateness of test
content, procedures. |
Content: Table 1, Appendix A
(online Manual). Validation used
online method only. |
1.10 Interpretation of
specific items |
Matching based exclusively on
factors and subfactors comprising several items. |
1.12 Rationale for composite
or difference scores. |
Rasch scaling. See Section 2. |
1.13 Describe conditions under
which validation data were collected. |
See Section 5, Appendix C
(online Manual). Validation is
based on data that was gathered online. |
1.14 Rationale and measurement
of “other” variables |
Simple classification
demographic variables are used whose validity seems evident are used. The TCT
showed no DTF related to age and sexual preference. |
1.15 Information concerning
criterion performance should be provided. |
User feedback is carefully
qualified when predicting respondents’ compatibility levels. |
1.22 When test results imply a
particular outcome, provide users with the basis for prediction. |
This manual is publicly
available on TRUE’s website.
The validation study (Appendix C, online Manual)
is submitted for publication. |
1.24 Investigate unintended
consequences. |
Extensive DTF tests
were performed, thus excluding age and sexual orientation as likely causes. |
Table 21: Applicable AERA / APA / NCME Reliability Standards
Applicable Standard[14] |
Summary, or reference
to section within this manual |
2.1 For each score, subscore,
or combination estimates of reliability and standard errors of estimate
should be given. |
See Section 2, and Section 4,
as well as the SEB Figures 4 through 20. |
2.2 Standard error of
measurement should be reported in raw score as well as scale units. |
All measures are in logits as
derived from the Rasch model. |
2.4 and 2.11 Consistency of
scores should be expressed for subgroups. |
No noticeable DTF was
found across age and sexual orientation. Hence the SEB
generalize across these groups. |
2.5 Reliability coefficients
appropriate to scaling method should be reported. |
Reliability expressed as Rasch
reliability, separation, and SEB, none in terms of
Classical Test Theory. |
2.7 When subsets if items
define partially independent traits, reliability estimates should recognize
the multifactor character of the instrument. |
Reliability, separation, and
the SEB are computed separately for each factor. |
2.10 When subjective judgment
enters into scoring, the inter-rater as well as within examinee consistency
should be reported. |
All scoring is performed by
software in an online context, no subjective ratings are allowed. |
2.11 Publishers should provide
reliability data as soon as feasible for each major population for which the
test is recommended. |
No noticeable DTF was
found across age and sexual orientation. Hence the SEB also
generalize across these groups. |
2.14 Conditional standard
errors of measurement should be reported at several score levels. |
Reliability expressed as Rasch
reliability, separation, and SEB. The latter are local and
vary across “scores.” |
Discussion
This report detailed the development of a comprehensive online compatibility test for both heterosexual and same-sex partnering. To our knowledge, the TCT is the only such instrument demonstrated to meet professional standards as outlined by the AERA, APA, and NCME (2002). In addition, the TCT satisfies other pertinent issues:
These additional features of the TCT directly address the cautions of Finn and Banach (2000) and Houran (2004, Houran et al., 2004) on the lack of standards and regulations concerning online human service practices. We argue that similar standards demonstrated here for the TCT should extend to all online testing services that portray themselves as following the ethical principles of psychological and testing professionals. In this way, consistent standards and regulations for online human service practices will be maintained and consumer confidence in and benefit from such services should increase. This is more than an academic issue. To date, at least one media report (Hahn, 2005) has exposed the trend for matchmaking companies to profit from unsubstantiated personality and compatibility tests. Thompson et al. (2005) recently touched on several of these points as well.
Despite the advancements represented in developing the TCT, two main criticisms can be levied. First, issues of generalizability accompany any non-random sampling procedure in the context of test development and validation. In addition, several authors (Burgess et al., 2001; Mathy et al., 2003) have emphasized the unique methodological issues related to conducting research via the Internet. However, as we have explained here and elsewhere (Houran et al., 2004), Rasch scaling overcomes some of these drawbacks in that its yields essentially population-free parameters. Secondly, it can be argued that the TCT (and other compatibility tests) are incomplete since they do not take into account physical attraction –that indefinable variable of romantic “chemistry.” To be sure, psychologists widely accept that love has at least two primary facets known as Passionate-Erotic Love and Companionate Love. Passionate Love is associated with sexual desire for a partner, whereas Companionate Love represents friendship-type platonic love towards a partner (for a review, see Masuda, 2003). Sternberg (1986) expands this conceptualization in his Triangular Theory of Love and Attachment. According to Sternberg, the amount of love or relationship satisfaction that a person experiences is due to the strength and interaction of three components: Intimacy (the feeling of closeness and bondedness), Passion (the drives that produce romance, physical attraction, and sexual intercourse), and Decision/Commitment (the decision that one loves another and the commitment to continue that relationship).
Clearly then, physical attraction is an important component in the major models of romantic compatibility and relationship development. However, the TCT was designed only to address those variables that appear to help promote and sustain Companionate Love, or what Sternberg might regard as Intimacy and Commitment. The same can also be said of the Wilson Relationship Compatibility Indicator (Wilson & Cousins, 2003b) and various compatibility tests that have not been scientifically substantiated. Most online matchmaking services either adopt physical attractiveness rating systems (e.g., TRUE.com) or simply leave the responsibility of judging the attractiveness of romantic prospects to the test-takers. In short, virtually all compatibility testing methods amount to psychological and behavioral profiling and matching. Still, we would be remiss not to mention a few notable attempts to pseudo-quantify romantic “chemistry.” Wilson and McLaughlin (2001) nicely summarized a wealth of literature on the psychology of perceived beauty and attractiveness. To be sure, there do appear some characteristics – like young age, facial symmetry, and a certain hip-to-waist ratio – that are nearly universally associated with attractiveness.
Of course, common sense and personal experience tell us that physical attraction is also a highly idiosyncratic phenomenon. To this end, the testing firm of weAttract.com has developed a computerized “physical attraction test” that finds photographs of individuals from a pool of online daters that a person will find attractive based on that person’s preferences mapped from a set of prototype faces and body types. Recently, at the 2005 iDate Conference, Fujii Film introduced facial recognition software that parallels the pioneering efforts of weAttract.com. This software reportedly finds matches to photographs a person finds attractive from online dating profiles. Thus, if an online-dater finds Person A and Person B attractive from their photographs, this software will locate other candidates from an online dating pool that resemble the photographs of Person A and B. As noted by Thompson et al. (2005), it remains to be seen whether psychological and physical compatibility can efficiently and validly be synthesized into a single compatibility test and matching system. We anticipate that any successful efforts along these lines would significantly increase the validity of a compatibility test in predicting relationship satisfaction and stability. In the mean time, we appreciate Wilson and Cousin’s (2003a) perspective on the current state of compatibility testing – “It will not tell you whether or not you are going to fall in love with another person in a compulsive, ‘chemical’ way, just whether or not it is a good idea if you do” (p. viii).
Finally, the findings from the TCT development and
validation bear on the issue of which assortative mating model yields greater
relationship quality – “birds of a feather flock together” or “opposites
attract.” We argue that the conflict between the models of similarity and complementarity
is largely illusory. Specifically, we learned from the TCT research that (i)
variables which defined relationship satisfaction formed a hierarchy, and that
(ii) men and women differed quantitatively and qualitatively on those
relationship variables. These findings suggested that men and women in
satisfying long-term relationships agreed on what variables impacted their
relationship quality, but that men and women did not have to agree on the
relative importance of specific variables to achieve that satisfaction. In
other words, relationship satisfaction appears to be grounded partly in
cognitive-behavioral processes, rather than being dependent upon patterns of
gross similarity or dissimilarity. In other words, there were clear gender differences on what makes a
satisfying and stable relationship.
Furthermore, the fact that the TCT measure of romantic compatibility utilizes a planned mixture of similarity and complementarity might partly explain why its scores correlated higher with relationship quality than the correlation Wilson and Cousins (2003b) reported for their similarity-based WCRI measure of compatibility. In addition, future research should re-examine the veracity of existing findings from the perspective of more sophisticated Rasch scaling and Item Response Theory methodologies rather than Classical Test Theory approaches. Our research strongly questions whether widely accepted findings are partly or wholly the result of artifacts related to test biases. To be sure, the cumulative results we have obtained thus far challenge the validity of simplistic models of relationship quality in terms of gross similarity or complementarity. Rasch scaling has elevated “variable-centered” models of assortative mating to a quantitative and qualitative schema or “couple-centered” approach (cf. Luo & Klohnen, 2005). Therefore, broadly speaking, couples with satisfying and stable relationships seem to be distinguished by their ability to integrate qualitatively different issues into the relationship via complex mental processes. Research is underway to understand these complexities in more detail.
Acknowledgments
Development of the TCT was funded by Herb D. Vest. We thank P. Jason Rentfrow, Andy Metcalf, and Kerry McKenna for their assistance in this research and preparation of this report.
References
·
Naglieri, J. A.,
Drasgow, F., Schmit, M., Handler, L., Prifitera, A., Margolis, A., &
Velasquez, R. (2004). Psychological testing on the Internet: new problems, old
issues. American Psychologist, 59, 150-162.
·
TRUE, LLC,
& Jerabek, I. (2004). The technical manual for the TRUE Compatibility
Test (TCT). Irving, TX: Author.
Additional Factors of the True
Compatibility Test
Note 1: Factors
1-13 are shown in Table 1 in the main text
|
Name |
Description |
14)
|
Expression of emotions |
Expression of emotions
measures the degree to which the test-taker is comfortable expressing his or her
emotions. This was included because
communication of emotions has been shown to be an important factor in
relationship success. |
15) |
Comfort witnessing display of emotion |
This subscale evaluates the
degree to which the test-taker is comfortable around other people who express
their emotions. This is important in
helping to identify those who would have difficultly being with people who
express their emotions often. It is
also important for the overall success of test-taker’s relationships – being
able to at least listen when a partner needs to express how they feel is
extremely important. |
16) |
Need for intellectual discussions |
The need for intellectual
discussions scale measures how much intellectual discourse the test-taker
prefers having in his/her life. This is important to ensure that they will be
matched with someone who they can converse with comfortably and in a manner
that satisfies their intellectual curiosity. |
17) |
Tolerance for differences in opinion |
The tolerance for differences of opinion scale measures how open
one is to really accepting differences in opinion with others. This is extremely important to the overall
success of relationships because differences are bound to arise at some point
– they can be a potential stumbling block that subjects should be aware
of. |
18) |
Tolerance for goal differences - support |
The tolerance for goal
differences measures how willing the subjects are to compromise when their
goals are not congruent with those of their partners. Obviously, this has implications for the
long-term survival of the relationship.
|
19) |
Communication skills |
The communication skills scale
measures how effectively the subjects are able to get their points across in
a sensitive and effective manner and how well they are able to receive
messages from others. Communication
is one of the major keys to a successful relationship. |
20) |
Flexibility |
This scale evaluates whether
the test-taker is able to accept or adjust to changed plans, new ideas, or
strange situations. Being flexible
allows for compromise in relationships. |
21) |
Tolerance |
How open and accepting the
test-taker is can spell trouble or success for future relationships. Essentially, this scale measures whether
the test taker is open to different opinions, and whether he or she is
critical of others. |
22) |
Assertiveness |
The assertiveness scale measures
how skilled the test-taker is at letting others know what they need and want
in a polite manner. Assertiveness is
essential to getting one’s needs met in a relationship. |
23) |
Sex life: Desired frequency (libido) |
This scale measures how high
the test-taker’s libido is. This is
important for sexual compatibility, as frequency of sexual contact is often a
source of conflict in long-term relationships. |
24) |
Sex life: Attitudes |
Measures whether the subjects
have conservative or liberal attitudes around sexuality. Having similar values here is important
for establishing sexual rapport. |
25) |
Sex life: Behavior |
The behavior scale evaluates
how conservative or liberal each subjects’ behavior is in the bedroom. Here too, similar comfort levels are a
must. |
26) |
Sex life: Experience |
Experience measures how
sexually experienced the test-taker is, according to self-report. This is not necessarily just about the number
of partners, but also about the variety of behaviors experienced. For some people, how sexually experienced
their sexual partners are is a major issue. |
27) |
Sex life: Faithfulness |
Faithfulness measures how
sexually faithful the test-taker has been in the past in relationships,
according to self-report. Past
behavior often predicts future behavior, which is why this was included on
the test. |
28) |
Social skills and agreeableness |
This scale evaluates how
socially skilled and pleasant it is to interact with the test-taker. It is important both within the
relationship, and in the couple’s relationship with others. |
29) |
Extroversion |
Extroversion measures the scale
from introverted (oriented towards inner reflection and alone time) to
extroverted (oriented outwards, socially oriented, and gregarious). |
30) |
Individual Friendships |
This scale evaluates the
amount of time the test-taker wants to spend with their partner as opposed to
with their own friends. |
31) |
Optimism/positive attitude |
Optimism/positive attitude
measures the participants’ outlook on life – optimistic or pessimistic. There’s little doubt that a positive attitude
is healthier in life, but not everyone is best paired with an optimist. |
32) |
Security in a relationship |
The security in a relationship scale measures how confident the
test-takers feel in a relationship; whether they need to be constantly reassured,
if they are able to feel secure – without reassurance. |
33) |
Readiness to commit |
Readiness to commit
evaluates how prepared the test-taker is to commit; whether or not marriage
is something they want. Being on the same
page here can help avoid a conflict and hurt feelings. |
34) |
Selflessness |
This scale measures the
subject’s willingness to put the needs of others ahead of his or her
own. In every relationship, there are
times where selflessness is necessary, where one must give up something and
concentrate on a partner’s well-being.
The ability to be selfless from time to time is an asset in
relationships. |
35) |
Standards (Other-oriented perfectionism) |
Standards refers to how
critical one is of others – being extremely hard on others or insisting that
things be done your way can spell trouble for relationships. Although there’s nothing wrong with having
some standards, it is important to keep other-oriented perfectionism in
check. |
36) |
Dependability/reliability |
Dependability/reliability
measures how reliably subjects accomplish the things that they tell others
that they are going to do. How well
someone can be relied upon is extremely important for establishing trust in a
relationship. |
37) |
Orderliness |
This scale measures how tidy
the test-takers are. How neat and
structured one tends to be is important later in a relationship, once two
people share a space. Having compatible
levels of orderliness can be helpful for establishing a good match. |
38) |
Organization/planning |
Organization and planning
refers to the test-takers’ preferences for being organized versus
disorganized, and how strictly they like to plan their time versus going with
the flow. Included due to the relevance
to many different life areas, and because being in agreement in this area is
important. |
39) |
Honesty |
This scale measures how
frank and upfront the test-taker is.
Although it is great to be honest, if the test-taker scores extremely
high in this scale they may lack a bit of tact – failing to differentiate
when the situation calls for some sensitivity. |
40) |
Self-discipline |
Self-discipline refers to
how much will-power and self-restraint the test-taker has. Obviously, how much one is willing to give
in to every desire has bearing in the relationship, as two people with very
different levels may become frustrated with one another. |
41) |
Anger Management |
Having control over angry
impulses is extremely important to all sorts of relationships, particularly
romantic ones. This scale measures
whether the test-taker tends to hold in anger versus letting it loose without
restraint. The ideal tends to be
somewhere in the middle. |
42) |
Open-mindedness |
The open-mindedness scale
measures whether the test-takers would rather stick to the familiar routine
or experience new things. Being an
open-minded person in a relationship with a close-minded individual can feel
stifling, while for the person who is less open minded it can feel outside of
the comfort zone. Compatibility is a
must. |
43) |
Sense of humor |
This scale is a self-report
measure of sense of humor. Whether
one is able to find the lighter side in even tough situations is something that
many people like to know up front about a potential partner. This is why it is important to include
sense of humor on the test. |
44) |
Attitudes and world-view |
Attitudes and world-view
measures whether one has a conservative or liberal outlook on political and
social issues. Although two people
with differing views can get along,
it is often more difficult, and for the more politically minded, it can be a
major issue. |
45) |
Energy level, zest |
Energy level and zest refers
to how energetic the test participants are.
Shared time together can be tough if the individuals have different
preferences of intensity level for shared activities. |
46) |
Spontaneity |
This scale measures how open
the test-taker is to taking off without notice, doing something different at
the spur of the moment, or to simply being carefree and letting the moment
take you where it might. This was
included in the test since differing levels of spontaneity in a relationship
can make one individual feel pressured to be more spontaneous, and one
individual to be less so. |
47) |
Ambition |
When it comes to drive to
success, measured by ambition, it is important to at least be somewhat in
agreement, if not necessarily on exactly the same page. Like in self-discipline, being on
completely different pages can be frustrating. Ambitious people may feel held back by less ambitious partners,
if not outright annoyed by what they see as a lack of motivation. |
48) |
Romance |
Romance measures whether one
feels that romance (i.e., gifts for no reason, love poetry, public displays
of affection, has a place in everyday life, whether it should be reserved for
special occasions, or whether it makes you totally uncomfortable. This is included because to some people, a
relationship lacking in romance is not a real relationship, while for others
it may be a complete turnoff. |
49) |
Adherence to Routine |
Adherence to routine
measures how much the test-taker sticks to a regular schedule, rather than
deciding on day-to-day or even moment to moment. Included because it is another area where people may need to
compromise a bit in order to make each other happy. This is an important potential area of contention. |
50) |
Importance of money |
The importance of money
scale refers to the value placed on acquiring security and wealth. It is important to keep this in mind when
choosing a mate – the pursuit of money comes with a cost – less time to spend
with family and friends. A person
who places less importance on money would be less willing to compromise in
this area then someone who places greater importance. |
51) |
Spending habits |
The spending habits scale
measures whether the test subject tends to spend or save his or her
money. From frugal to spendthrift, this
preference is something that can have an effect from the first meeting,
especially if differences are great. |
52) |
Tactfulness |
Tactfulness measures whether
the test-taker is sensitive to the feelings of others, and guards against
saying hurtful things. There are
times in all relationships where one must tread carefully with the feelings
of loved ones, so this is a very important trait to include. |
53) |
Dependency |
Dependency, the tendency to
rely on one person for the majority of support and strength, can be extremely
detrimental to a relationship, regardless of whether the other person is also
dependent. Individuals who work on standing on their own two feet tend to
fare better in a relationship. |
54) |
Personal space need |
This scale identifies how
much freedom and time alone the test takers need. This scale is especially important to help avoid hurt feelings
in a relationship – it can be very disappointing if one partner wants to
spend more time separate than the other.
|
55) |
Jealousy |
The jealousy scale refers to
whether the test taker feels distrust about their partner’s fidelity, and
whether jealousy has become an issue to them in relationships. Jealousy rears its head in many
relationships, but if it gets out of control it can become a very serious
issue in relationships. |
56) |
Need for control |
Need for control refers to
how much the candidates feel a need to be in charge. This is included because people with a
high need for control are often seen as domineering. |
57) |
Couple Friendships |
Couple friendship refers to
how often each subject wants to get together with friends as a couple, how
involved in an outside social life they should be together. Important to maintain friendships for
multiple reasons, but especially because the implications for not having
outside friendships can be so severe. |
58) |
Willingness to Rely on Others |
This scale measures how
accepting the test-taker is of outside help.
If unwilling to rely on others, this has implications for how willing
s/he is to being vulnerable with a romantic partner. |
59) |
Coping skills |
Coping skills measure how
well the test- taker deals with stressful situations in his/her life. It is important because everyone goes
through stressful times, and being unable to cope can make individuals lash
out at others in an unpleasant way. |
60) |
Emotional intelligence |
Emotional intelligence
includes the ability to recognize our feelings and needs for what they really
are (ex. realize that we are “feeling rejected”, rather than simply “upset”)
as well as the ability to regulate them (console ourselves when we are
feeling sad, or control our anger). Part
of emotional intelligence is the ease with which we cope with emotions in
others, and how well we can reconcile our immediate desires with our
long-term goals and the needs of other people. |
61) |
Mood stability |
This scale measures how well
the test-taker can roll with stress rather than becoming emotionally
unstable. Moodiness and a foul temper
can have a negative effect on relationships, if taken to an extreme. |
62) |
Tolerance of mood instability |
This scale evaluates the
degree to which the test-taker is comfortable around other people who have an
unstable mood. This is important to
identify those who would have difficultly being with people who often
experience mood instability. |
63) |
Trust |
An important aspect of
successful relationships is being able to trust a partner. This scale measures the extent to which
the individual is able to have faith that a romantic partner has their best
interest at heart and can be trusted.
|
64) |
Need for privacy |
As some individuals need both
physical and emotional privacy (need to keep some secrets to themselves, need
to have some space where they can be alone, or store things they need to keep
to themselves), this scale is important to have in the TCI. |
65) |
Need for emotional intimacy |
A need for emotional
intimacy refers to how emotionally close one prefers to get with romantic
partners. Being in synch here is
important, because unequal levels can interfere with long-term success of the
relationship. |
66) |
Appearance |
Used only for matching, this
scale measures how well-groomed and dressed the test-taker tends to be. Theoretically, this was included to take
into account the importance of appearance in attraction. |
67) |
Attentiveness |
Attentiveness refers to how
attentive the subject is to his or her partner while in a relationship. It is very important because for someone
who feels that they need their partner to dote on them a bit, someone who is
not attentive would fall short. |
68) |
Self-esteem and Self-confidence |
Self-esteem and
self-confidence refers to how good the test-taker feels about him/herself,
and how confident s/he is about his/her abilities. An area that is important both in terms of individual
happiness, and in relationships, this is something that everyone should work
on if it is an issue. |
69) |
Gender roles |
The gender roles scale
assesses whether the subject has conservative or progressive gender
roles. Especially for the long-term
success of the relationship, but also for the short term, this is a key area
of agreement. |
70) |
Chivalry |
Chivalry has two different
measures, depending on the gender of the test-taker. For straight males, it measures whether he
likes to be chivalrous with the women he dates, opening doors, offering his
coat if she’s cold, etc. For straight
women, it measures whether or not she is receptive to chivalry. This scale is not administered to homosexual
men or women. |
71) |
Conflict proneness |
This scale measures whether
the test-taker’s personality leads him or her to be prone to conflict. Experiencing conflicts at the drop of a
hat is damaging and can interfere with the development of successful relationships
of all kinds, especially romantic ones. |
72) |
Conflict resolution |
The conflict resolution
scale evaluates whether the test-taker is able to deal well with
conflict. |
73) |
Ties with friends and family
|
The ties with friends and family
scale refers to how important the test-taker’s social and familial ties are
to him/her. This is included in the
TCI because it is very important that a couple be willing to support one
another in this choice. |
74) |
Parenting – Giving affection |
This scale measures the
degree to which a parent gives affection to his or her children (or plans to
in the future). This is included in
the TCI as a way to pair people of similar parenting styles together. |
75) |
Parenting – Setting limits |
The Setting limits scale
evaluates the degree to which test subjects are strict with their
children. Disagreements can occur in
the long-term if parents disagree about how much discipline to use with their
children. |
76) |
Attention seeking |
The attention seeking scale
assesses the degree to which the test-taker desires to be the center of
attention. |
77) |
Dating philosophy |
Dating philosophy assesses
where the test-taker stands in terms of what s/he wants to get out of the
relationship. Obviously, someone who
wants only a casual relationship is not a good pairing for someone ready to
settle down. |
78) |
Intellectualism
|
Intellectualism refers to
how driven the subject is to seek knowledge and learning in his/her
life. Interests, activities, and
philosophies of life are all influenced by this factor. |
Note 2. Individual Questions: Some specific question responses are used to match people with particular tastes; for instance, those that indicated that romantic people are unattractive to them will definitely not be paired with hopeless romantics.
Individual Questions:
|
Question |
Description
|
79) |
Should
a husband and wife have separate bank accounts? a)
Yes. b)
Yes, but a joint
account for household bills. c)
No. |
This question is a
logistical question regarding whether the subject wants to have joint or
separate bank accounts. If a couple goes
into a long-term commitment with differing views in this area, this could be
a potential source of conflict, and something that would require a great deal
of compromise. |
80) |
I am interested in: a)
A casual relationship. b)
Commitment - without
the paperwork of marriage. c)
Marriage to the right
person. d)
Marriage to anyone who
meets my basic requirements. e)
Marriage - I don’t care
to whom. |
This question is a
straightforward look at what kind of relationship the subject is looking for
– from casual to marriage. This is
very important for matching purposes because by pairing people carefully
based on this question can avoid a lot of disappointment. |
81) |
I
find someone rather messy … a)
Attractive b)
Acceptable c)
Unattractive |
This question evaluates how test-takers
feel about messiness in their romantic partner. By matching people who have strong feelings about this with
neat partners, many problems can be avoided down the road, should the
relationship work out. |
82) |
People
who have traditional beliefs about men and women a)
Appeal to me b)
Are OK with me c)
Are unattractive to me |
This question asks whether
the test-taker finds people who have traditional beliefs about men and women
appealing or unappealing. This has a lot
of bearing in whether we should pair people with traditional gender roles
with them or not. |
83) |
People who have a lot of
sexual experience a)
Appeal to me b)
Are OK with me c)
Are unattractive to me |
This question asks whether
the test-taker finds people who have a lot of sexual experience appealing or
unappealing. This has a lot of
bearing in whether we should pair people who report having a lot of sexual
experience with them or not. |
84) |
People who think money is
really important a)
Appeal to me b)
Are OK with me c)
Are unattractive to me |
This question was written to
help determine whether the test-taker finds people who find money important
appealing or unappealing. This helps
determine whether we should pair people who score high in importance of money
subscale with them or not. |
85) |
People who sometimes get
jealous a)
Appeal to me b)
Are OK with me c)
Are unattractive to me |
This question evaluated
whether the test-taker finds people who get jealous sometimes appealing or
unappealing. This way, they can be
paired with people who suit their preferences. |
86) |
People who are romantic a)
Appeal to me b)
Are OK with me c)
Are unattractive to me |
This question evaluates
whether the test-taker finds people who are romantic attractive or
unattractive. This way, they can be
paired with people who suit their preferences. |
87) |
People who are frugal a)
Appeal to me b)
Are OK with me c)
Are unattractive to me |
This question evaluated
whether the test-taker finds people who are frugal financially attractive or unattractive. This way, they can be paired with people
who they will get along with. |
88) |
When dating someone, how
often do you expect to see him/her? a)
Every day b) Several times a week c)
Once a week d) Several times a month e)
Once a month f)
Less than once a month |
Another logistical question;
how often the test-taker likes to see someone they are dating in
general. Pairing two people with
similar preferences in this area will certainly help create compatible
couples. |
89) |
When I start dating someone
new, I expect to have sex: a)
On the first date b) Within a couple of weeks c)
Within a month d) Within several months e)
When I’m married |
Preferences for how quickly
one becomes physical with a new partner can vary widely. Pairing two people
who share the same inclination in this area can avoid a lot of friction. |
[1] Since these standards will
be referred to repeatedly, we adopt the convention of not giving the complete
reference each time. Instead, the standards are referred to by their numbering in
the 1999 edition of the Standards for Educational and Psychological Testing
as issued jointly by the AERA, APA, and NCME.
[2] The remaining 82 TCT factors
are either subsets or combinations of these seventeen factors.
[3] Note that the use of the word “adaptive” here differs from that in Computer Adaptive Testing (CAT) as items are not selected such that these optimally address respondents’ trait levels. Rather, in this context the term adaptive reflects that respondents may skip questions that were not intended for them (e.g., some questions are gender specific, and, say, men would skip the questions intended for women only).
[4] For reasons discussed
earlier, the psychometric analyses reported in later sections are based on a
subset of 218 items.
[5] Equation 1 is also sometimes
referred to as the one-parameter logistic model (Hambleton et al., 1991). While
this characterization is correct with respect to its mathematical form, the
Rasch model differs in terms of the parameter estimation procedures being used.
In particular, the Joint (i.e., unconditional) Maximum Likelihood Estimation
(JMLE) methods used to fit the Rasch model does not require the assumption of a
particular distribution of the person parameters B (Mislevy & Bock,
1990). Accordingly, depending on the underlying distributions, the parameter
estimates obtained under the Rasch model and those obtained for the
one-parameter logistic may vary considerably.
[6] Rasch estimates and their SE
cannot strictly be estimated for extreme sum-scores. Hence, all comparisons of
any extremes are tenuous at best.
[7] An age of 35 years
represents the median age in the sample of respondents.
[8] Note that this criticism does not apply to factor analyses of entire subscales that consist of multiple items.
[9] This statement holds for all
Rasch models.
[10] In contrast to the
item-total correlations computed within the framework of classical test theory,
these correlations involve the Rasch dimension rather than the sum of the raw
scores.
[11] Within each table, the items are listed in order of their location parameters Bii.
[12] The smaller number of MM
and FF respondents caused some rating scale categories not to be used.
Hence, the observed score ranges for these groups was sometimes smaller than
that for the other groups.
[13] For efficiency, the text of
the AERA / APA / NCME Standards is shown paraphrased and / or abbreviated. Only
applicable standards are addressed – e.g., all standards pertaining to adaptive
testing or educational testing are omitted.
[14] For efficiency, the text of
the AERA/APA/NCME Standards is shown paraphrased and / or abbreviated. Only
applicable standards are addressed – e.g., all standards pertaining to adaptive
testing or educational testing are omitted.
[15] This paper is included in
Appendix D of the online Manual, and
a revised version is under review for publication.