Differences between revisions 71 and 85 (spanning 14 versions)

This page describes the reviewing "algorithm" for each of the two ESA 2018 Track B PCs.

Because of the experiment and because it's a good idea anyway, we try to specify it beforehand. We will use a relative standard algorithm: three independent reviews per submission with a score from {−2, −1, 0, +1, +2} each, followed by a discussion phase, where the reviewers discuss with each other and submissions are proposed for acceptance or rejection in rounds.

Though standard and used in many program committees, it is not easy to give a full specification of this algorithm. The basic procedure is clear, but there are many eventualities, most of which will not happen, but some of which will, and it's hard to say in advance which. There is also a fair amount of complex human judgement involved that is hard to formalize. And there are many variations.

We try to be as specific as possible, without being overly complicated or impractical. The result is not a 100% complete and precise specification of the process. We will fill in the gaps and fix problems in a reasonable way as we go along. As far as the experiment is concerned, these conditions are not perfect, but reasonable given the complexity of the process and the agents involved. That being said, please let me know if you see a way to improve the following specification.

Contents

Schedule
Reviews
Discussion Phase
Change Log

Schedule

The total time for the reviewing process (from the submission deadline to the author notification) is 8 weeks. The reviewing process proceeds in the following phases:

1. The deadline for submissions is April 22 AoE (strict)
2. Bidding and paper assignment: 1 week (~ April 23 - April 29)
3. Reviewing: 4 weeks (~ April 30 - May 27)
4. Discussion and recalibration of reviews: 2 weeks (~ May 28 - June 10)
5. Buffer for things going wrong or taking longer than expected: 1 week
6. The notification deadline is June 18 (maybe earlier)

Reviews

We expect around 50 submissions. Each submission should receive 3 reviews (more reviews are possible, but this is the exception). Since each of the two PCs has exactly 12 members, this is an expected load of around 12 submissions per PC member.

Sub-reviewers and Conflicts of Interest

We recommend that you review the submissions yourself, but you may ask sub-reviewers for some of the submissions if you prefer to do so. In any case, you should familiarize yourself with each submission assigned to you and its review, so that you can have a competent discussion with the other PC members. The discussion phase is an essential part of the reviewing process.

During the bidding phase you can and should also identity any Conflict of Interest (CoI) with any submission. If you make use of sub-reviewers, you should make sure that they do not have a CoI either. In case of doubt, they should write something about this as part of their review, and the respective PC member should add this part in the Comments to the PC field in EasyChair. Typical cases for a CoI for a submission are:

1. One of the authors is your relative/significant other
2. One of the authors has been your advisor or PhD student in the last 10 years
3. One of the author comes from the same department
4. You feel there is a CoI for another reason (e.g. you have, say, many joint publications)

Guidelines for the Review Text

Each review should provide the following information:

1. A short summary of the main contribution(s) of the submission in the words of the reviewer
2. An itemized list of the strength and weaknesses of the submission
2.1 The strengths should be numbered (S1), (S2), ...
2.2 The weaknesses should be numbered (W1), (W2), ...

The purpose of the numbering is so that it is easy to reference these items in the discussion. The numbering can but does not have to express a relative ranking of the strengths and weaknesses.

Each review also can provide the following information (the authors will be grateful to you):

3. More detailed explanations of the strengths and weaknesses
4. Comments to the authors for improving the paper

You can change your review text in the discussion phase. However, the discussion (and the whole reviewing process) will not work, if your initial review is not substantial.

Guidelines for the Review Score

Each review should provide one of the following scores. Just like the text of your reviews, these scores are important for the discussion phase. You can change your scores during the discussion phase, but it will greatly help the efficiency and quality of the process, if you hit the "right" score for a paper already in your initial review.

Score	Assessment of submission	Behavior during discussion
+2 (accept)	Good fit and no major weaknesses	I would champion this paper and fight against rejection
+1 (weak accept)	Significant weaknesses, but still acceptable	I would support this paper, but not fight against rejection
0 (borderline)	Hovering between +1 and −1	Not sure yet about the severity of the weaknesses / the threshold for ESA
−1 (weak reject)	Significant weaknesses, lean to reject	I would not support this paper, but not fight against acceptance
−2 (reject)	Bad fit or major weaknesses	I would oppose this paper and fight against acceptance

Some conferences also have +3 (strong accept) and −3 (strong reject). Experience shows that they are of little use for deciding on the set of accepted papers for a moderate number of submissions, as in ESA Track B (around 50).

Some conferences disallow the borderline score of 0, to enforce a clear opinion on the reviewer. In the discussion phase, we indeed ask revievers to commit to one of the other scores. But for the first review, we think it makes sense to allow this score, because it reflects one of the typical sentiments about a paper at this stage of the reviewing process, as expressed by the hovering between +1 and −1 in the table above.

Reviewers might not be fully aware yet of their behavior during the discussion phase for various reasons (for example: not sure about some aspects of the paper, not sure about the nature of the threshold for ESA Track B, general inexperience in reviewing). This can make choosing the right score difficult. It is exactly one of the tasks of the discussion phase to bring the final scores (and reviews) closer to what they are supposed to reflect.

Discussion Phase

The discussion phase starts as soon as all the reviews are in. It lasts approximately two weeks; see the schedule above.

Beginning of the Discussion Phase

At the beginning of the discussion phase, each PC member should do the following (all the discussion and communication happens within EasyChair):

1. Read the reviews from the other reviewers
2. Comment on contrary arguments or ask questions if something is unclear
3. Adapt your review and possibly the score to what you have learned from the discussion
4. If your initial score was 0, change it away from 0 based on what you have learned from the other reviews and from the discussion

Groups of submissions

To specify the decision process, it is useful to define the following partitioning into groups. Except for Group X (which hopefully will be empty), the descriptions assume that there are at least three reviews for each submission. The description in parentheses says what is likely to happen to a submission in this group. This will be described in more detail in the next section.

Group A1 : clear support (will probably be accepted)
Group A2 : at least one champion + weak support from the others (good chance to be accepted)
Group A3 : only weak support from everybody (might be accepted if room)

Group C1 : weak support + strong opposition (resolve or move to R2)
Group C2 : strong support + weak opposition (resolve or vote in the end)
Group C3 : strong support + strong opposition (resolve or vote in the end)

Group R1 : strong opposition (will probably be rejected)
Group R2 : mix of strong and weak opposition (will probably be rejected)

Group R3 : weak opposition from everybody (will probably be rejected)

Group X : two of the reviews are missing or completely lack substance (aquire missing/additional reviews)

The assignment of a submission to one of these groups will not be done by score alone, but also based on what is written in the reviews. Of course, there will be a strong correlation to the scores. In fact, if the scores were perfect, the correlation would be perfect. But it lies in the nature of the process that some reviewers (and PC members) are unsure about a submission or about the threshold for ESA. So one important part of the discussion phase is to bring the scores closer to what they are intended to reflect.

For example, a submission with scores {2, 2, 2} will probably be in Group A1 (unless the support expressed in the reviews is weaker than it might appear from the scores, in which case Group A2 might be more appropriate), and a submission with only negative scores will probably be in Group R1 (unless the reviews are more positive about the paper than it might appear from the scores, in which case Groups R2 or C1 might be more appropriate).

Submissions can change groups at any time due to the ongoing discussions and corresponding changes in the reviews and/or scores.

The group assignment of a submission can also be challenged by other PC members (who did not write one of the three original reviews for the submission). For example, if another PC member formulates an argument against a submission from Group A2, that submission will go into Groups C2 or C3.

No decision is final until the end of the discussion phase.

Decision Process (Rounds)

After the preparation above (or partly in parallel to it), the discussion will proceed in rounds. Each round lasts several days. In each round, the PC chair will suggest certain submissions for acceptance and others for rejection. In EasyChair, these submissions will be marked accept? and reject?. PC members can challenge these suggestions until the next round. In each round that is not the first, submissions that were marked accept? or reject? in the previous round and that were not challenged, will be marked ACCEPT and REJECT. If nobody challenges these decisions anymore, these will become the final decisions for these submissions.

Submissions that have changed groups, will be treated like they would have been treated within that group in a previous round. For example, if for a submission from Group C2 (strong support + weak opposition) the opposition crumbles, the submission moves to Group A2 and will be suggested for accept? in the next round. Or, if for a submission from Group C1 (weak support + strong opposition) the support crumbles, the submission moves to Group R2 and will be suggested for reject? in the next round.

Round 1 : A1 → accept?, R1 → reject?, C1 → push for champion, C2 → challenge opposition, C3 → push for resolution
Round 2 : A2 → accept?, R2 → reject?, A3 and C1 → push for champion, C2 → challenge opposition, C3 → push for resolution
Round 3 : R3 → reject?, A3 → push for champion, C1 → reject?, C2 and C3 → like in Round 2
Round 4 : A3 and C2 and C3 → send email to PC with short summary for each of these + call for vote
Round 5 : Suggestion for final decisions
Round 6 : Finalize decisions

There should be as few submissions as possible left in Groups C2, C3 by the end of Round 3. The vote is really just an emergency measure for submissions, where (despite all attempts), the controversy could not be resolved.

At the beginning of each round, the approximate number of "free slots" (total number of submissions that can be accecpted minus the number of submissions already marked ACCEPT) will be announced. This can of course influence the discussions and the fates of the submissions.

Change Log

A link to this web page was first sent to the PCs on April 20 00:41 UTC. The following list mentions only significant changes in content. Purely editorial changes or improvements in formulations which do change the meaning significantly are not logged.

April 20 16:30 UTC: added information about conflicts of interest (for PC members and sub-reviewers) and added "approximate" to the last paragraph in the description of the disscusion phase to make sure that PC members cannot deduce information about their CoI submissions from the current status quo.

April 21 15:12 UTC: made partition into groups slightly more fine-grained and more complete. Adapted the descriptions of the rounds accordingly.

-  ⇤ ← Revision 71 as of 2018-04-20 00:59:49 → 
  Size: 11993
  Editor: Hannah Bast
  Comment:
+   ← Revision 85 as of 2018-04-21 16:14:26 → ⇥
  Size: 14167
  Editor: Hannah Bast
  Comment:
-Deletions are marked like this.
+Additions are marked like this.
 Line 1:
-This page describes the reviewing "algorithm" for each of the two ESA 2018 Track B PCs.
+This page describes the reviewing "algorithm" for each of the two [[ESA2018Experiment/SelectionOfPCs|ESA 2018 Track B PCs]].
 Line 29:
-== Sub-reviewers ==
+== Sub-reviewers and Conflicts of Interest ==
 Line 32:
+During the bidding phase you can and should also identity any Conflict of Interest (CoI) with any submission. If you make use of sub-reviewers, you should make sure that they do not have a CoI either. In case of doubt, they should write something about this as part of their review, and the respective PC member should add this part in the ''Comments to the PC'' field in !EasyChair. Typical cases for a CoI for a submission are:

{{{#!html
<p style="color: darkblue">
1. One of the authors is your relative/significant other<br/>
2. One of the authors has been your advisor or PhD student in the last 10 years<br/>
3. One of the author comes from the same department<br/>
4. You feel there is a CoI for another reason (e.g. you have, say, many joint publications)</p>
}}}
-Line 46:
+Line 57:
+The purpose of the numbering is so that it is easy to reference these items in the discussion. The numbering can but does not have to express a relative ranking of the strengths and weaknesses.
-Line 58:
+Line 71:
-Each review should provide one of the following scores. Just like the text of your reviews, these scores are important for the [[http://ad-wiki.informatik.uni-freiburg.de/research/ESA2018Experiment/DiscussionPhase|discussion phase]]. You can change your scores during the discussion phase, but it will greatly help the efficiency and quality of the process, if you hit the "right" score for a paper already in your initial review.
+Each review should provide one of the following scores. Just like the text of your reviews, these scores are important for the discussion phase. You can change your scores during the discussion phase, but it will greatly help the efficiency and quality of the process, if you hit the "right" score for a paper already in your initial review.
-Line 62:
+Line 75:
-<tr><th>Score</th><th>Verdict</th><th>Behavior during discussion</th></tr>
+<tr><th>Score</th><th>Assessment of submission</th><th>Behavior during discussion</th></tr>
-Line 70:
+Line 83:
-Remark 1: Some conferences also have +3 (strong accept) and −3 (strong reject). Experience shows that they are of little use for deciding on the set of accepted papers for a moderate number of submissions, as in ESA Track B (around 50).
+Some conferences also have +3 (strong accept) and −3 (strong reject). Experience shows that they are of little use for deciding on the set of accepted papers for a moderate number of submissions, as in ESA Track B (around 50).
-Line 72:
+Line 85:
-Remark 2: Some conferences disallow the borderline score of 0, to enforce a clear opinion on the reviewer. In the discussion phase, we indeed ask revievers to commit to one of the other scores. But for the first review, we think it makes sense to allow this score, because it reflects one of the typical sentiments about a paper at this stage of the reviewing process, as expressed by the ''hovering between +1 and −1'' in the table above.
+Some conferences disallow the borderline score of 0, to enforce a clear opinion on the reviewer. In the discussion phase, we indeed ask revievers to commit to one of the other scores. But for the first review, we think it makes sense to allow this score, because it reflects one of the typical sentiments about a paper at this stage of the reviewing process, as expressed by the ''hovering between +1 and −1'' in the table above.
-Line 74:
+Line 87:
-Remark 2: Reviewers might not be fully aware yet of their behavior during the discussion phase for various reasons (for example: not sure about some aspects of the paper, not sure about the nature of the threshold for ESA Track B, general inexperience in reviewing). This can make choosing the right score difficult. It is exactly one of the tasks of the [[http://ad-wiki.informatik.uni-freiburg.de/research/ESA2018Experiment/DiscussionPhase|discussion phase]] to bring the final scores (and reviews) closer to what they are supposed to reflect.
+Reviewers might not be fully aware yet of their behavior during the discussion phase for various reasons (for example: not sure about some aspects of the paper, not sure about the nature of the threshold for ESA Track B, general inexperience in reviewing). This can make choosing the right score difficult. It is exactly one of the tasks of the [[http://ad-wiki.informatik.uni-freiburg.de/research/ESA2018Experiment/DiscussionPhase|discussion phase]] to bring the final scores (and reviews) closer to what they are supposed to reflect.
-Line 96:
+Line 109:
-To specify the decision process, it is useful to categorize submissions into the following Groups. Except for Group X (which hopefully will be empty), the descriptions assume that there are at least three reviews for each submission. The description in parentheses says what is likely to happen to a submission in this group. This will be described in more detail in the next section.
+To specify the decision process, it is useful to define the following partitioning into ''groups''. Except for Group X (which hopefully will be empty), the descriptions assume that there are at least three reviews for each submission. The description in parentheses says what is likely to happen to a submission in this group. This will be described in more detail in the next section.
-Line 101:
+Line 114:
-<b>Group A2 :</b> at least one champion + weak support from the others (good chance to be accepted)</p>
+<b>Group A2 :</b> at least one champion + weak support from the others (good chance to be accepted)<br/>
<b>Group A3 :</b> only weak support from everybody (might be accepted if room)</p>
-Line 104:
+Line 118:
-<b>Group C1 :</b> weak support + strong opposition (resolve or vote in the end)<br/>
+<b>Group C1 :</b> weak support + strong opposition (resolve or move to R2)<br/>
-Line 110:
+Line 124:
-<b>Group R2 :</b> weak opposition + no champion (will probably be rejected)</p>
+<b>Group R2 :</b> mix of strong and weak opposition (will probably be rejected)</p>
<b>Group R3 :</b> weak opposition from everybody (will probably be rejected)</p>
-Line 135:
+Line 150:
-<b>Round 2 :</b> A2 → accept?, R2 → reject?, C1 → push for champion, C2 → challenge opposition, C3 → push for resolution<br/>
<b>Round 3 :</b> C1 → reject?, C2 and C3 → like in Round 2<br/>
<b>Round 4 :</b> C2 and C3 → send email to PC with short summary for each of these + call for vote<br/>
+<b>Round 2 :</b> A2 → accept?, R2 → reject?, A3 and C1 → push for champion, C2 → challenge opposition, C3 → push for resolution<br/>
<b>Round 3 :</b> R3 → reject?, A3 → push for champion, C1 → reject?, C2 and C3 → like in Round 2<br/>
<b>Round 4 :</b> A3 and C2 and C3 → send email to PC with short summary for each of these + call for vote<br/>
-Line 142:
+Line 157:
-There should be as few submissions as possible left in Groups C1, C2, C3 by the end of Round 3. The vote is really just an emergency measure for submissions, where (despite all attempts), no reasonable consensus could be reached.
+There should be as few submissions as possible left in Groups C2, C3 by the end of Round 3. The vote is really just an emergency measure for submissions, where (despite all attempts), the controversy could not be resolved.

At the beginning of each round, the approximate number of "free slots" (total number of submissions that can be accecpted minus the number of submissions already marked ''ACCEPT'') will be announced. This can of course influence the discussions and the fates of the submissions.

= Change Log =

A link to this web page was first sent to the PCs on April 20 00:41 UTC. The following list mentions only significant changes in content. Purely editorial changes or improvements in formulations which do change the meaning significantly are not logged.

April 20 16:30 UTC: added information about conflicts of interest (for PC members and sub-reviewers) and added "approximate" to the last paragraph in the description of the disscusion phase to make sure that PC members cannot deduce information about their CoI submissions from the current status quo.

April 21 15:12 UTC: made partition into groups slightly more fine-grained and more complete. Adapted the descriptions of the rounds accordingly.