Lets put the experimental question myth to bed... Forum

ndbigdave · Post by **ndbigdave** » Sun Sep 10, 2017 12:09 pm

I have seen a very consistent theme through a number of threads on these bar related threads about the "experimental questions." The common comments I see relate to how "random," "confusing," or "out of left field" these questions happen to be.

The problem with all of the above comments is...they simply cannot be true (at least in the case of 95%+ of the questions)

The "experimental questions" that are a part of the test arent to "try some random things out" but instead to test possible questions for future MBE examinations. The NCBE WANTS these questions to be good, fair questions. There is an actual science to drafting multiple choice questions to make them "fair" in a number of ways (word choice, jargon, level of diction, rule of law being tested and then that the available answers are also "fair"). I know that Joe Separac has linked out to other studies/comments about the MBE but the test is very, very well researched and presented to achieve a thorough exam with each question generating a foundational amount of answers.

The main point is - is it possible that a question or two may be "bad" and not used in the future? Sure. But there is a narrative about the 10 and now 25 "experimental questions" being totally random, unfair and arbitrary...and that isnt the case. They are unscored questions, that are being tested for future use - these are not questions being thrown against the wall.

SilvermanBarPrep · Post by **SilvermanBarPrep** » Sun Sep 10, 2017 1:14 pm

Quite true. The entire purpose of the experimental questions would be deemed null if they were recognized as such.

TXBar2017 · Post by **TXBar2017** » Mon Sep 11, 2017 10:29 am

ndbigdave wrote:I have seen a very consistent theme through a number of threads on these bar related threads about the "experimental questions." The common comments I see relate to how "random," "confusing," or "out of left field" these questions happen to be.

The problem with all of the above comments is...they simply cannot be true (at least in the case of 95%+ of the questions)

The "experimental questions" that are a part of the test arent to "try some random things out" but instead to test possible questions for future MBE examinations. The NCBE WANTS these questions to be good, fair questions. There is an actual science to drafting multiple choice questions to make them "fair" in a number of ways (word choice, jargon, level of diction, rule of law being tested and then that the available answers are also "fair"). I know that Joe Separac has linked out to other studies/comments about the MBE but the test is very, very well researched and presented to achieve a thorough exam with each question generating a foundational amount of answers.

The main point is - is it possible that a question or two may be "bad" and not used in the future? Sure. But there is a narrative about the 10 and now 25 "experimental questions" being totally random, unfair and arbitrary...and that isnt the case. They are unscored questions, that are being tested for future use - these are not questions being thrown against the wall.

I told myself the same thing. There was only like 1 question that I thought had to be an experimental because the wording of it just seemed off and confusing. I thought maybe I was just worn out from the test, but it was definitely worded weirdly. Other than that, you can't know because like you said, they want to potentially use these later.

dans1006 · Post by **dans1006** » Mon Sep 11, 2017 11:34 am

That's all well and true, but from my understanding, part of the qualifying process is gathering data and tweaking the questions to deal with potential issues in the wording of the questions. Prior to this refinement, there may be a higher level of ambiguity or room for misinterpretation. This may lead a taker to re-read the question multiple times in an effort to resolve that ambiguity with some detail they believe they must have missed. This leads to additional time that could be used on other questions being used on a question that will not even be scored. If you're a person who's finishing with an extra half hour each time, this might be ineffectual. If you're a person who is sprinting to finish or someone who sometimes doesn't finish the entire test, this lack of refinement in the question could lead to the missing of questions that might have been more easily answered if they were not racing through them as the end of time approached.

Though I do agree that the effect of the additional experimental questions is being over hyped by people who are worried about their results, I also believe that there is a basis to argue that it may have had an effect. The most compelling piece of evidence, to me, is that on the first test where this change was made, the average dropped to the lowest level on record. I've not heard a sufficient explanation of why that significant of a swing would happen in the absence of this variable. I am, of course, open to further debate of this issue. But it seems that the camps are dividing into the belief that it is of no consequence OR it's totally the only reason I failed. I suspect the truth lies somewhere in between these two poles.

ndbigdave · Post by **ndbigdave** » Mon Sep 11, 2017 12:12 pm

dans1006 wrote:That's all well and true, but from my understanding, part of the qualifying process is gathering data and tweaking the questions to deal with potential issues in the wording of the questions. Prior to this refinement, there may be a higher level of ambiguity or room for misinterpretation. This may lead a taker to re-read the question multiple times in an effort to resolve that ambiguity with some detail they believe they must have missed. This leads to additional time that could be used on other questions being used on a question that will not even be scored. If you're a person who's finishing with an extra half hour each time, this might be ineffectual. If you're a person who is sprinting to finish or someone who sometimes doesn't finish the entire test, this lack of refinement in the question could lead to the missing of questions that might have been more easily answered if they were not racing through them as the end of time approached.

Though I do agree that the effect of the additional experimental questions is being over hyped by people who are worried about their results, I also believe that there is a basis to argue that it may have had an effect. The most compelling piece of evidence, to me, is that on the first test where this change was made, the average dropped to the lowest level on record. I've not heard a sufficient explanation of why that significant of a swing would happen in the absence of this variable. I am, of course, open to further debate of this issue. But it seems that the camps are dividing into the belief that it is of no consequence OR it's totally the only reason I failed. I suspect the truth lies somewhere in between these two poles.

Not wanting to turn this into an argument thread in the slightest, and while I understand your point in paragraph 1, the easy counter to the statement is: pay attention to your time, dont spend too much time on any one question, if the question doesnt make sense, skip it and return later. That is the best way to avoid wasting time on any question (experimental or otherwise) so that you complete the test. While I do agree that spending any time on a question that will result in no points seems unfair the system is what it is. I had just noticed a reoccuring theme through post after post of people talking about these "experimental questions" as if they were from another planet and simply there to screw with test takers which simply isnt the case and is a myth that should just die.

dans1006 · Post by **dans1006** » Mon Sep 11, 2017 1:58 pm

I suspect our positions are not THAT far apart. My position is that the addition of the extra 15 experimental problems does actually make the test harder for some people, particularly people who are having trouble exercising the discipline to pull the trigger and move on. The fact that there is a way to address such an increase in difficulty and just because it does not effect all students in the way it does the more neurotic of our peers doesn't mean it's not harder though.

With that said, I also think the impact is probably overblown. You're absolutely correct that it is not intended to just mess with people. The questions are not supposed to be more tricky than others and their purpose is not to trip you up. But their purpose can be one thing and their effect can still be another.

ndbigdave · Post by **ndbigdave** » Mon Sep 11, 2017 2:32 pm

dans1006 wrote:I suspect our positions are not THAT far apart. My position is that the addition of the extra 15 experimental problems does actually make the test harder for some people, particularly people who are having trouble exercising the discipline to pull the trigger and move on. The fact that there is a way to address such an increase in difficulty and just because it does not effect all students in the way it does the more neurotic of our peers doesn't mean it's not harder though.

With that said, I also think the impact is probably overblown. You're absolutely correct that it is not intended to just mess with people. The questions are not supposed to be more tricky than others and their purpose is not to trip you up. But their purpose can be one thing and their effect can still be another.

Ill agree with your premise and can definitely think of at least one person where every possible second is valuable, to think they are wasting time on a unperfected question just seems wrong at it core.

What hits me first is the likelihood of one of these "bad" questions - one that is truly unperfected, poorly worded or whose available options arent tweaked. I maintain its rare, but with the bump from 10 to 25 the odds certainly increase.

Not a ton of disagreement between us, your point is taken and of course relates to some people (though again, it does seem overblown) my point was not about how it may effect people but just how the discussion relating to the experimental questions and how much nonsense was being put out there.

TXBar2017 · Post by **TXBar2017** » Mon Sep 11, 2017 5:34 pm

I'll say this though, on all of my simulated MBE's I was finishing super early, with roughly 30 min of time at the end, whereas I only finished with 5 min of time left on actual test day. Maybe I was just being more careful since it was the real deal, but it could also be that I spent longer on experimentals. We'll never know.

JoeSeperac · Post by **JoeSeperac** » Tue Sep 12, 2017 12:10 pm

NCBE started using experimental questions on the MBE in February 2007. When NCBE made this change, it was announced well in advance of the change. NCBE normally doesn’t rush into things because the MBE is a high-stakes exam and the any change could affect its reliability. For example, in 2008, Erica Moeser, the President of NCBE stated: "If Civil Procedure finds its way into a future MBE (and the elaborate nature of our test development process suggests that it will be in the neighborhood of a couple of years before any change, if authorized, would be implemented), it will be necessary to revisit the current distribution of questions on each of the existing six topics. Currently, Contracts and Torts are represented by 33 questions each, with the other four topics accounting for 31 questions each, for a total of 190 scored items. The door will be open for us to consider what the ideal distribution of test content should be." See The Bar Examiner, November 2008. Civil Procedure was subsequently added to the MBE seven years later in February 2015.

Because NCBE is usually methodical in what they do, I was surprised when they made an announcement on August 31, 2016 that the MBE would go from 10 experimental questions to 25 experimental questions starting with the Feb 2017 exam. see http://seperac.com/pdf/2016_0831_moeser ... scores.pdf

As someone who has been tracking everything NCBE has done for years, this was an abrupt and out-of-character change that they seemingly tried to regard as unimportant by putting it at the bottom of a memo. In my opinion, the only reason NCBE would change the exam in a direction that reduces its reliability (and do it rather suddenly) is due to concern regarding the discussion of the MBE questions on forums. A wider range of new experimental questions makes it harder to know what is scored versus what is not.

Unless NCBE changed their method of development of MBE questions, the experimental questions should be unrecognizable from the graded questions. According to NCBE, content experts from NCBE's drafting committee participate in twice-a-year meetings where questions are reviewed and modified before they are used on a live exam. (The Bar Examiner, February 2007). Even when these new questions are actually put on an exam, they start as ungraded test items (now 25 out of 200 MBE questions are ungraded). This is to ensure that each test item is unambigious, accurate and psychometrically sound.

I regard the bigger impact to be the reduced reliability of the MBE. Fewer graded questions means lower reliability, which means you are less likely to be consistent in your MBE score. The examinees who benefit from this change are the ones with lower MBE ability. For example, if you are playing someone in 1-on-1 basketball, the weaker player is better off agreeing to a short game against a stronger player (e.g. 7 points to win) because the weaker player has a better chance of getting lucky and winning. Conversely, if you are the stronger player, you want to agree to a longer game (e.g. 21 points to win) to give yourself the best chance at winning (because the longer the game, the more opportunities you have to demonstrate your higher-ability). However, the MBE component of the exam is so reliable to begin with that the change from 190 to 175 is probably a non-issue (e.g. the weaker basketball player probably has a negligibly better chance of winning by playing to 175 versus playing to 190).

The fact that only 175 MBE questions are scored is not a bad thing for examinees who put a significant portion of their study-time into the MBE (which you should be doing). It will hurt the examinees who put less time into MBE study/practice (because each question they miss due to their reduced MBE study/practice carries a greater weight). Since only 175 MBE questions are now graded on the UBE exam, you want to answer a minimum of 110 of these graded questions correctly (63%) to give yourself a good chance at passing.

By the way, according to NCBE, the July 2017 MBE average rose 1.4 points over the July 2016 average which bodes well for pass rates.
See http://www.law.com/sites/almstaff/2017/ ... her-hands/

InterAlia1961 · Post by **InterAlia1961** » Tue Sep 12, 2017 5:29 pm

Thank you for posting this. It may sound silly, but having some idea how the test is formulated helps to calm my nerves. I know that on some forums right after the exam, people were discussing some of the harder questions, with many postulating that they must've been experimental questions. But I doubt it. I don't think the examinee can tell. That's why I did my level best on each one.

Toubro · Post by **Toubro** » Tue Sep 12, 2017 10:46 pm

JoeSeperac wrote: I regard the bigger impact to be the reduced reliability of the MBE. Fewer graded questions means lower reliability, which means you are less likely to be consistent in your MBE score. The examinees who benefit from this change are the ones with lower MBE ability. For example, if you are playing someone in 1-on-1 basketball, the weaker player is better off agreeing to a short game against a stronger player (e.g. 7 points to win) because the weaker player has a better chance of getting lucky and winning. Conversely, if you are the stronger player, you want to agree to a longer game (e.g. 21 points to win) to give yourself the best chance at winning (because the longer the game, the more opportunities you have to demonstrate your higher-ability). However, the MBE component of the exam is so reliable to begin with that the change from 190 to 175 is probably a non-issue (e.g. the weaker basketball player probably has a negligibly better chance of winning by playing to 175 versus playing to 190).

Are you sure about the impact on reliability? The reliability of the MBE has usually been around 0.9, and it seems unchanged even now. In fact, the reliability of the Feb. 2017 MBE with 25 pretest questions — the latest MBE for which the NCBE has released information — was 0.92, the highest for any Feb. MBE. That's from the latest Bar Examiner in Moeser's column (may she have a restful retirement lol).

Also, re: the OP's gripe that started this whole thread. Misinformation about what experimental questions are starts to spread when the Princeton Review and other SAT prep companies find a way to simplify the test and its notorious experimental section to 10th graders. After people make up their minds about them, that negative attitude proves to be unfortunately tenacious.

JoeSeperac · Post by **JoeSeperac** » Wed Sep 13, 2017 9:50 am

Toubro wrote: Are you sure about the impact on reliability? The reliability of the MBE has usually been around 0.9, and it seems unchanged even now. In fact, the reliability of the Feb. 2017 MBE with 25 pretest questions — the latest MBE for which the NCBE has released information — was 0.92, the highest for any Feb. MBE. That's from the latest Bar Examiner in Moeser's column (may she have a restful retirement lol).

Also, re: the OP's gripe that started this whole thread. Misinformation about what experimental questions are starts to spread when the Princeton Review and other SAT prep companies find a way to simplify the test and its notorious experimental section to 10th graders. After people make up their minds about them, that negative attitude proves to be unfortunately tenacious.

Longer tests yield more reliable scores than shorter tests. In a 2004 paper on the MBE, the author stated that “The coefficient alpha measure of reliability for each section (100 items) was .80. Using the Spearman-Brown formula, the reliability of the entire test (200 items) would be estimated at .89 and is consistent with other licensure examinations that cover a broad domain.”
See http://journals.sagepub.com/doi/abs/10. ... 4405282483

Based on the above, if a 100 question MBE exam has a reliability of .80, a 175 question MBE exam would have a reliability of .875 while a 190 question MBE exam would have a reliability of .883 using the Spearman-Brown formula. This means that with a 175 question MBE exam, 76.5% of the difference between two applicants' scores on one form could have been predicted by their respective scores on a prior form while for the 190 question MBE exam, 78% of the difference between their scores on one form could have been predicted by their respective scores on the prior form. This isn't a big difference, but the reliability of the MBE has decreased due to the extra experimentals.

Toubro · Post by **Toubro** » Wed Sep 13, 2017 3:30 pm

JoeSeperac wrote:
Toubro wrote: Are you sure about the impact on reliability? The reliability of the MBE has usually been around 0.9, and it seems unchanged even now. In fact, the reliability of the Feb. 2017 MBE with 25 pretest questions — the latest MBE for which the NCBE has released information — was 0.92, the highest for any Feb. MBE. That's from the latest Bar Examiner in Moeser's column (may she have a restful retirement lol).

Also, re: the OP's gripe that started this whole thread. Misinformation about what experimental questions are starts to spread when the Princeton Review and other SAT prep companies find a way to simplify the test and its notorious experimental section to 10th graders. After people make up their minds about them, that negative attitude proves to be unfortunately tenacious.
Longer tests yield more reliable scores than shorter tests. In a 2004 paper on the MBE, the author stated that “The coefficient alpha measure of reliability for each section (100 items) was .80. Using the Spearman-Brown formula, the reliability of the entire test (200 items) would be estimated at .89 and is consistent with other licensure examinations that cover a broad domain.”
See http://journals.sagepub.com/doi/abs/10. ... 4405282483

Based on the above, if a 100 question MBE exam has a reliability of .80, a 175 question MBE exam would have a reliability of .875 while a 190 question MBE exam would have a reliability of .883 using the Spearman-Brown formula. This means that with a 175 question MBE exam, 76.5% of the difference between two applicants' scores on one form could have been predicted by their respective scores on a prior form while for the 190 question MBE exam, 78% of the difference between their scores on one form could have been predicted by their respective scores on the prior form. This isn't a big difference, but the reliability of the MBE has decreased due to the extra experimentals.

I understood that math, but I'm trying to reconcile that with NCBE's own publications of reliability, which suggest that reliability has improved rather than decreased even though the number of scored items has decreased.

Surely the rule that "longer tests yield more reliability" is based on the assumption that no other parameters change. That has to be true because I can write a 600-item practice MBE personally, and I'm willing to be it will be less reliable than NCBE's 175-item test.

So while the test's shortened length reduces reliability, perhaps what explains NCBE's data is that the quality of the items itself is improving.

InterAlia1961 · Post by **InterAlia1961** » Thu Sep 14, 2017 10:22 am

Maybe this can help understand how a longer test, or a test with more scored questions, improves reliability. It's about odds. If you're a poker player, you know to stay away from the three-card games and Black Jack at casinos. They're a fool's game. The better games are Texas Hold 'em, Seven-card Stud, and Mississippi Stud. The reason is that the more cards that are thrown, the more likely that a player and not the house is going to win. The tree-card poker game coupled with the Pair Plus bet are a sure way to lose your ass. In fact, you can tell a lot about a casino by looking at their card tables, even if you're there to play slots. If you only see the three-card and BJ tables open, that casino is probably not where you want to gamble. They stack the table games in their favor. They likely have the slots tied down as well. This is the same with the MBE. The fewer questions, the greater the likelihood that the outcome will favor the house...in this case, the State Bar of California, which requires a 1440 cut score.

himanhi · Post by **himanhi** » Thu Sep 14, 2017 1:09 pm

InterAlia1961 wrote:Maybe this can help understand how a longer test, or a test with more scored questions, improves reliability. It's about odds. If you're a poker player, you know to stay away from the three-card games and Black Jack at casinos. They're a fool's game. The better games are Texas Hold 'em, Seven-card Stud, and Mississippi Stud. The reason is that the more cards that are thrown, the more likely that a player and not the house is going to win. The tree-card poker game coupled with the Pair Plus bet are a sure way to lose your ass. In fact, you can tell a lot about a casino by looking at their card tables, even if you're there to play slots. If you only see the three-card and BJ tables open, that casino is probably not where you want to gamble. They stack the table games in their favor. They likely have the slots tied down as well. This is the same with the MBE. The fewer questions, the greater the likelihood that the outcome will favor the house...in this case, the State Bar of California, which requires a 1440 cut score.

?? I think ur trying to equate probability and volatility with reliability, and volatility swings both ways in a normal distribution so it doesnt help the analogy...

Do we know how the examiners reached the new reliability figure? Seems unlikely that they had a treasure trove of higher quality questions to drop as soon as they changed formats. Or maybe im being a nut

Toubro · Post by **Toubro** » Thu Sep 14, 2017 2:19 pm

himanhi wrote:
InterAlia1961 wrote:Maybe this can help understand how a longer test, or a test with more scored questions, improves reliability. It's about odds. If you're a poker player, you know to stay away from the three-card games and Black Jack at casinos. They're a fool's game. The better games are Texas Hold 'em, Seven-card Stud, and Mississippi Stud. The reason is that the more cards that are thrown, the more likely that a player and not the house is going to win. The tree-card poker game coupled with the Pair Plus bet are a sure way to lose your ass. In fact, you can tell a lot about a casino by looking at their card tables, even if you're there to play slots. If you only see the three-card and BJ tables open, that casino is probably not where you want to gamble. They stack the table games in their favor. They likely have the slots tied down as well. This is the same with the MBE. The fewer questions, the greater the likelihood that the outcome will favor the house...in this case, the State Bar of California, which requires a 1440 cut score.
?? I think ur trying to equate probability and volatility with reliability, and volatility swings both ways in a normal distribution so it doesnt help the analogy...

Do we know how the examiners reached the new reliability figure? Seems unlikely that they had a treasure trove of higher quality questions to drop as soon as they changed formats. Or maybe im being a nut

That's a very good question. But only one of two things is happening based on the reliability info the NCBE released: 1. The decreased number of scored questions isn't negatively affecting reliability (armchair statistics notwithstanding), or 2. They're lying.

anabasis · Post by **anabasis** » Thu Sep 14, 2017 2:58 pm

Toubro wrote:
himanhi wrote:
InterAlia1961 wrote:Maybe this can help understand how a longer test, or a test with more scored questions, improves reliability. It's about odds. If you're a poker player, you know to stay away from the three-card games and Black Jack at casinos. They're a fool's game. The better games are Texas Hold 'em, Seven-card Stud, and Mississippi Stud. The reason is that the more cards that are thrown, the more likely that a player and not the house is going to win. The tree-card poker game coupled with the Pair Plus bet are a sure way to lose your ass. In fact, you can tell a lot about a casino by looking at their card tables, even if you're there to play slots. If you only see the three-card and BJ tables open, that casino is probably not where you want to gamble. They stack the table games in their favor. They likely have the slots tied down as well. This is the same with the MBE. The fewer questions, the greater the likelihood that the outcome will favor the house...in this case, the State Bar of California, which requires a 1440 cut score.
?? I think ur trying to equate probability and volatility with reliability, and volatility swings both ways in a normal distribution so it doesnt help the analogy...

Do we know how the examiners reached the new reliability figure? Seems unlikely that they had a treasure trove of higher quality questions to drop as soon as they changed formats. Or maybe im being a nut
That's a very good question. But only one of two things is happening based on the reliability info the NCBE released: 1. The decreased number of scored questions isn't negatively affecting reliability (armchair statistics notwithstanding), or 2. They're lying.

Alternatively, 3. Their methodology for estimating reliability is not accurate or has been cherry-picked to show the desired outcome. Since each administration is a different set of questions to a different set of test-takers, reliability can't be directly measured but must be estimated. Several methods exist, and the estimated reliability can differ between different methods. Even tracking repeat test takers would have serious limitations, since repeaters aren't necessarily representative of the whole pool, and presumably study between sittings can affect your results.

Because there isn't a lot of transparency here, it is hard to be sure what is going on under the hood.

Smiddywesson · Post by **Smiddywesson** » Tue Mar 19, 2019 4:02 pm

This is an old post, but maybe somebody can explain to me how going from a 190 questions test to a 175 question test can affect reliability, and yet the examiners use just 30 questions to equate the test to each of the two previous groups of examinees (30 equators from February and 30 from July)?

My take on all this was the examiners got spooked by the cheaters, so they upped the ante to 25 pretest questions instead of ten, delivered in 10 forms of the test rather than 8. The end result was they had 250 pretest questions validated on an actual test rather than just 80, and now they could cycle the questions faster than the vermin could steal them.

Lets put the experimental question myth to bed... Forum

Lets put the experimental question myth to bed...

Re: Lets put the experimental question myth to bed...

Re: Lets put the experimental question myth to bed...

Re: Lets put the experimental question myth to bed...

Re: Lets put the experimental question myth to bed...

Register now to search topics and post comments!

Re: Lets put the experimental question myth to bed...

Re: Lets put the experimental question myth to bed...

Re: Lets put the experimental question myth to bed...

Re: Lets put the experimental question myth to bed...

Register for access!

Re: Lets put the experimental question myth to bed...

Re: Lets put the experimental question myth to bed...

Re: Lets put the experimental question myth to bed...

Re: Lets put the experimental question myth to bed...

Resources to assist law school applicants, students & graduates.

Re: Lets put the experimental question myth to bed...

Re: Lets put the experimental question myth to bed...

Re: Lets put the experimental question myth to bed...

Re: Lets put the experimental question myth to bed...

Re: Lets put the experimental question myth to bed...

Register now!