Differences between revisions 7 and 66 (spanning 59 versions)

This page describes the reviewing "algorithm" for each of the two ESA 2018 Track B PCs.

Because of the experiment and because it's a good idea anyway, we try to specify it beforehand. We will use a relative standard algorithm: three independent reviews per submission with a score from {−2, −1, 0, +1, +2} each, followed by a discussion phase, where the reviewers discuss with each other and submissions are proposed for acceptance or rejection in rounds.

Though standard and used in many program committees, it is not easy to give a full specification of this algorithm. The basic procedure is clear, but there are many eventualities, most of which will not happen, but some of which will, and it's hard to say in advance which. There is also a fair amount of complex human judgement involved that is hard to formalize. And there are many variations.

We try to be as specific as possible, without being overly complicated or impractical. The result is not a 100% complete and precise specification of the process. We will fill in the gaps and fix problems in a reasonable way as we go along. As far as the experiment is concerned, these conditions are not perfect, but reasonable given the complexity of the process and the agents involved. That being said, please let me know if you see a way to improve the following specification.

Contents

Schedule
Reviews
Discussion Phase

Schedule

The total time for the reviewing process (from the submission deadline to the author notification) is 8 weeks. The reviewing process proceeds in the following phases:

1. The deadline for submissions is April 22 AoE (strict)
2. Bidding and paper assignment: 1 week (~ April 23 - April 29)
3. Reviewing: 4 weeks (~ April 30 - May 27)
4. Discussion and recalibration of reviews: 2 weeks (~ May 28 - June 10)
5. Buffer for things going wrong or taking longer than expected: 1 week
6. The notification deadline is June 18 (maybe earlier)

Reviews

We expect around 50 submissions. Each submission should receive 3 reviews (more reviews are possible, but this is the exception). Since each of the two PCs has exactly 12 members, this is an expected load of around 12 submissions per PC member.

Sub-reviewers

We recommend that you review the submissions yourself, but you may ask sub-reviewers for some of the submissions if you prefer to do so. In any case, you should familiarize yourself with each submission assigned to you and its review, so that you can have a competent discussion with the other PC members. The discussion phase is an essential part of the reviewing process.

Guidelines for the Review Text

Each review should provide the following information:

1. A short summary of the main contribution(s) of the submission in the words of the reviewer
2. An itemized list of the strength and weaknesses of the submission
2.1 The strengths should be numbered (S1), (S2), ...
2.2 The weaknesses should be numbered (W1), (W2), ...

Each review also can provide the following information (the authors will thank you):

3. More detailed explanations of the strengths and weaknesses
4. Comments to the authors for improving the paper

You can change your review text in the discussion phase. However, the discussion phase (and the whole reviewing process) will not work, if the initial review is not substantial.

Guidelines for the Review Score

Each review should provide one of the following scores. Just like the text of your reviews, these scores are important for the discussion phase. You can change your scores during the discussion phase, but it will greatly help the efficiency and quality of the process, if you hit the "right" score for a paper already in your initial review.

Score	Verdict	Behavior during discussion
+2 (accept)	Good fit and no major weaknesses	I would champion this paper and fight against rejection
+1 (weak accept)	Significant weaknesses, but still acceptable	I would support this paper, but not fight against rejection
0 (borderline)	Hovering between +1 and −1	Not sure yet about the severity of the weaknesses / the threshold for ESA
−1 (weak reject)	Significant weaknesses, lean to reject	I would not support this paper, but not fight against acceptance
−2 (reject)	Bad fit or major weaknesses	I would oppose this paper and fight against acceptance

Remark 1: Some conferences also have +3 (strong accept) and −3 (strong reject). Experience shows that they are of little use for deciding on the set of accepted papers for a moderate number of submissions, as in ESA Track B (around 50).

Remark 2: Some conferences disallow the borderline score of 0, to enforce a clear opinion on the reviewer. In the discussion phase, we indeed ask revievers to commit to one of the other scores. But for the first review, we think it makes sense to allow this score, because it reflects one of the typical sentiments about a paper at this stage of the reviewing process, as expressed by the hovering between +1 and −1 in the table above.

Remark 2: Reviewers might not be fully aware yet of their behavior during the discussion phase for various reasons (for example: not sure about some aspects of the paper, not sure about the nature of the threshold for ESA Track B, general inexperience in reviewing). This can make choosing the right score difficult. It is exactly one of the tasks of the discussion phase to bring the final scores (and reviews) closer to what they are supposed to reflect.

Discussion Phase

The discussion phase starts as soon as all the reviews are in. It lasts approximately two weeks; see the schedule above.

Beginning of the Discussion Phase

At the beginning of the discussion phase, each PC member should do the following (all the discussion and communication happens within EasyChair):

1. Read the reviews from the other reviewers
2. Comment on contrary arguments or ask questions if something is unclear
3. Adapt your review and possibly the score to what you have learned from the discussion
4. If your initial score was 0, change it away from 0 based on what you have learned from the other reviews and from the discussion

Groups of submissions

To specify the decision process, it is useful to categorize submissions into the following Groups. Except for Group X (which hopefully will be empty), the descriptions assume that there are at least three reviews for each submission. The description in parentheses says what is likely to happen to a submission in this group. This will be described in more detail in the next section.

Group A1 : clear support (will probably be accepted)
Group A2 : at least one champion + weak support from the others (good chance to be accepted)

Group C1 : weak support + strong opposition (resolve or vote in the end)
Group C2 : strong support + weak opposition (resolve or vote in the end)
Group C3 : strong support + strong opposition (resolve or vote in the end)

Group R1 : strong opposition (will probably be rejected)
Group R2 : weak opposition + no champion (will probably be rejected)

Group X : two of the reviews are missing or completely lack substance (aquire missing/additional reviews)

The assignment of a submission to one of these groups will not be done by score alone, but also based on what is written in the reviews. Of course, there will be a strong correlation to the scores. In fact, if the scores were perfect, the correlation would be perfect. But it lies in the nature of the process that some reviewers (and PC members) are unsure about a submission or about the threshold for ESA. So one important part of the discussion phase is to bring the scores closer to what they are intended to reflect.

For example, a submission with scores {2, 2, 2} will probably be in Group A1 (unless the support expressed in the reviews is weaker than it might appear from the scores, in which case Group A2 might be more appropriate), and a submission with only negative scores will probably be in Group R1 (unless the reviews are more positive about the paper than it might appear from the scores, in which case Groups R2 or C1 might be more appropriate).

Submissions can change groups at any time due to the ongoing discussions and corresponding changes in the reviews and/or scores.

The group assignment of a submission can also be challenged by other PC members (who did not write one of the three original reviews for the submission). For example, if another PC member formulates an argument against a submission from Group A2, that submission will go into Groups C2 or C3.

No decision is final until the end of the discussion phase.

Decision Process (Rounds)

After the preparation above (or partly in parallel to it), the discussion will proceed in rounds. Each round lasts several days. In each round, the PC chair will suggest certain submissions for acceptance and others for rejection. In EasyChair, these submissions will be marked accept? and reject?. PC members can challenge these suggestions until the next round. In each round that is not the first, submissions that were marked accept? or reject? in the previous round and that were not challenged, will be marked ACCEPT and REJECT. If nobody challenges these decisions anymore, these will become the final decisions for these submissions.

Submissions that have changed groups, will be treated like they would have been treated within that group in a previous round. For example, if for a submission from Group C2 (strong support + weak opposition) the opposition crumbles, the submission moves to Group A2 and will be suggested for accept? in the next round. Or, if for a submission from Group C1 (weak support + strong opposition) the support crumbles, the submission moves to Group R2 and will be suggested for reject? in the next round.

Round 1 : A1 → accept?, R1 → reject?, C1 → push for champion, C2 → challenge opposition, C3 → push for resolution
Round 2 : A2 → accept?, R2 → reject?, C1 → push for champion, C2 → challenge opposition, C3 → push for resolution
Round 3 : C1 → reject?, C2 and C3 → like in Round 2
Round 4 : C2 and C3 → send email to PC with short summary for each of these + call for vote
Round 5 : Suggestion for final decisions
Round 6 : Finalize decisions

There should be as few submissions as possible left in Groups C1, C2, C3 by the end of Round 3. The vote is really just an emergency measure for submissions, where (despite all attempts), no reasonable consensus could be reached.

-  ⇤ ← Revision 7 as of 2018-04-19 00:06:22 → 
  Size: 3302
  Editor: Hannah Bast
  Comment:
+   ← Revision 66 as of 2018-04-20 00:50:40 → ⇥
  Size: 12136
  Editor: Hannah Bast
  Comment:
-Deletions are marked like this.
+Additions are marked like this.
 Line 1:
-This page is about the reviews for the ESA 2018 Track B PCs. We expect around 50 submissions. Each submission should receive 3 reviews (more reviews are possible, but this is the exception). Each of the two PCs has exactly 12 members. This is an expected load of around 12 submissions per PC member.
+This page describes the reviewing "algorithm" for each of the two ESA 2018 Track B PCs.

Because of the experiment and because it's a good idea anyway, we try to specify it beforehand.  We will use a relative standard algorithm: three independent reviews per submission with a score from {−2, −1, 0, +1, +2} each, followed by a discussion phase, where the reviewers discuss with each other and submissions are proposed for acceptance or rejection in rounds.

Though standard and used in many program committees, it is not easy to give a full specification of this algorithm. The basic procedure is clear, but there are many eventualities, most of which will not happen, but some of which will, and it's hard to say in advance which. There is also a fair amount of complex human judgement involved that is hard to formalize. And there are many variations.

We try to be as specific as possible, without being overly complicated or impractical. The result is not a 100% complete and precise specification of the process. We will fill in the gaps and fix problems in a reasonable way as we go along. As far as the experiment is concerned, these conditions are not perfect, but reasonable given the complexity of the process and the agents involved. That being said, please let [[https://ad.informatik.uni-freiburg.de/staff/bast|me]] know if you see a way to improve the following specification.

<<TableOfContents(2)>>

= Schedule =

The total time for the reviewing process (from the submission deadline to the author notification) is 8 weeks. The reviewing process proceeds in the following phases:

{{{#!html
<p style="color: darkblue">
1. The deadline for submissions is April 22 AoE (strict)<br/>
2. Bidding and paper assignment: 1 week (~ April 23 - April 29)<br/>
3. Reviewing: 4 weeks (~ April 30 - May 27)<br/>
4. Discussion and recalibration of reviews: 2 weeks (~ May 28 - June 10)<br/>
5. Buffer for things going wrong or taking longer than expected: 1 week<br/>
6. The notification deadline is June 18 (maybe earlier)</p/
}}}

= Reviews =

We expect around 50 submissions. Each submission should receive 3 reviews (more reviews are possible, but this is the exception). Since each of the two PCs has exactly 12 members, this is an expected load of around 12 submissions per PC member.

== Sub-reviewers ==

We recommend that you review the submissions yourself, but you may ask sub-reviewers for some of the submissions if you prefer to do so. In any case, you should familiarize yourself with each submission assigned to you and its review, so that you can have a competent discussion with the other PC members. The [[http://ad-wiki.informatik.uni-freiburg.de/research/ESA2018Experiment/DiscussionPhase|discussion phase]] is an essential part of the reviewing process.

== Guidelines for the Review Text ==

Each review ''should'' provide the following information:

{{{#!html
<!-- <div style="display: inline-block; border: 2px solid #f0f0f0; background-color:#f9f9f9; text-align: left; vertical-align: middle; padding:10px 20px"> -->
<p style="color: darkblue">
1. A short summary of the main contribution(s) of the submission in the words of the reviewer<br/>
2. An itemized list of the strength and weaknesses of the submission<br/>
2.1 The strengths should be numbered (S1), (S2), ...<br/>
2.2  The weaknesses should be numbered (W1), (W2), ...</p>
}}}

Each review also ''can'' provide the following information (the authors will thank you):

{{{#!html
<p style="color: darkblue">
3. More detailed explanations of the strengths and weaknesses<br/>
4. Comments to the authors for improving the paper</p>
}}}

You can change your review text in the [[ESA2018Experiment/RewiewingAlgorithm#Discussion_Phase|discussion phase]]. However, the discussion phase (and the whole reviewing process) will not work, if the initial review is not substantial.

== Guidelines for the Review Score ==

Each review should provide one of the following scores. Just like the text of your reviews, these scores are important for the [[http://ad-wiki.informatik.uni-freiburg.de/research/ESA2018Experiment/DiscussionPhase|discussion phase]]. You can change your scores during the discussion phase, but it will greatly help the efficiency and quality of the process, if you hit the "right" score for a paper already in your initial review.

{{{#!html
<table style="color: darkblue">
<tr><th>Score</th><th>Verdict</th><th>Behavior during discussion</th></tr>
<tr><td>+2 (accept)</td><td>Good fit and no major weaknesses</td><td>I would champion this paper and fight against rejection</td></tr>
<tr><td>+1 (weak accept)</td><td>Significant weaknesses, but still acceptable</td><td>I would support this paper, but not fight against rejection</td></tr>
<tr><td>&nbsp;&nbsp;0 (borderline)</td><td>Hovering between +1 and −1</td><td>Not sure yet about the severity of the weaknesses / the threshold for ESA</td></tr>
<tr><td>−1 (weak reject)</td><td>Significant weaknesses, lean to reject</td><td>I would not support this paper, but not fight against acceptance</td></tr>
<tr><td>−2 (reject)</td><td>Bad fit or major weaknesses</td><td>I would oppose this paper and fight against acceptance</td></tr></table>
}}}

Remark 1: Some conferences also have +3 (strong accept) and −3 (strong reject). Experience shows that they are of little use for deciding on the set of accepted papers for a moderate number of submissions, as in ESA Track B (around 50).

Remark 2: Some conferences disallow the borderline score of 0, to enforce a clear opinion on the reviewer. In the discussion phase, we indeed ask revievers to commit to one of the other scores. But for the first review, we think it makes sense to allow this score, because it reflects one of the typical sentiments about a paper at this stage of the reviewing process, as expressed by the ''hovering between +1 and −1'' in the table above.

Remark 2: Reviewers might not be fully aware yet of their behavior during the discussion phase for various reasons (for example: not sure about some aspects of the paper, not sure about the nature of the threshold for ESA Track B, general inexperience in reviewing). This can make choosing the right score difficult. It is exactly one of the tasks of the [[http://ad-wiki.informatik.uni-freiburg.de/research/ESA2018Experiment/DiscussionPhase|discussion phase]] to bring the final scores (and reviews) closer to what they are supposed to reflect.
-Line 4:
+Line 77:
-<<TableOfContents(1)>>
+= Discussion Phase =
-Line 6:
+Line 79:
-= Sub-reviewers =
+The discussion phase starts as soon as all the reviews are in. It lasts approximately two weeks; see the [[http://ad-wiki.informatik.uni-freiburg.de/research/ESA2018Experiment/ReviewingAlgorithm|schedule]] above.
-Line 8:
+Line 81:
-We recommend that you review the papers yourself, but you may ask sub-reviewers for some of the submissions if you prefer to do so. In any case, you should familiarize yourself with the paper and the review, so that you can have a competent discussion with the other PC members. The [[http://ad-wiki.informatik.uni-freiburg.de/research/ESA2018Experiment/DiscussionPhase|discussion phase]] is an essential part of the reviewing process.
-Line 10:
+Line 82:
-= Review Text =
+== Beginning of the Discussion Phase ==
-Line 12:
+Line 84:
-Each review should provide the following information:
+At the beginning of the discussion phase, each PC member should do the following (all the discussion and communication happens within !EasyChair):
-Line 14:
+Line 86:
-. A short summary of the main contribution(s) of the submission in the words of the reviewer<<BR>>
2. An itemized list of the strength and weaknesses of the submission<<BR>>
2.1 The strengths should be numbered (S1), (S2), ...<<BR>>
2.2  The weaknesses should be numbered (W1), (W2), ...<<BR>>
3. More detailed explanations of the strengths and weaknesses (if necessary or useful)<<BR>>
4. Comments to the authors for improving the paper (if applicable)<<BR>>
+{{{#!html
<p style="color: darkblue">
1. Read the reviews from the other reviewers<br/>
2. Comment on contrary arguments or ask questions if something is unclear<br/>
3. Adapt your review and possibly the score to what you have learned from the discussion<br/>
4. If your initial score was 0, change it away from 0 based on what you have learned from the other reviews and from the discussion</p>
}}}
-Line 21:
+Line 94:
-= Review Score =
+== Groups of submissions ==
-Line 23:
+Line 96:
-Each review should provide one of the following scores.
+To specify the decision process, it is useful to categorize submissions into the following Groups. Except for Group X (which hopefully will be empty), the descriptions assume that there are at least three reviews for each submission. The description in parentheses says what is likely to happen to a submission in this group. This will be described in more detail in the next section.
-Line 25:
+Line 98:
-|| '''Score'''      || '''Verdict''' || '''Behavior during discussion''' ||
|| +2 (accept)      || No major weaknesses || I would champion this paper and fight against rejection ||
|| +1 (weak accept) || Significant weaknesses, but nothing fatal || I would support this paper, but not fight against rejection ||
|| &nbsp;&nbsp;0 (borderline)   || Hovering between +1 and −1 || Not sure yet about the severity of the weaknesses / the threshold for ESA ||
|| −1 (weak reject) || Significant weaknesses, but nothing fatal || I am not supporting this paper, but would also not fight against acceptance ||
|| −2 (reject)      || Major weaknesses || I am opposing this paper and fight against acceptance ||
+{{{#!html
<p style="color: darkblue">
<b>Group A1 :</b> clear support (will probably be accepted)<br/>
<b>Group A2 :</b> at least one champion + weak support from the others (good chance to be accepted)</p>
-Line 32:
+Line 103:
-Remark 1: Some conference also have +3 (strong accept) and −3 (strong reject). Experience shows that they are of little use for deciding on the set of accepted papers for a moderate number of submissions, as in ESA Track B (around 50).
+<p style="color: darkblue">
<b>Group C1 :</b> weak support + strong opposition (resolve or vote in the end)<br/>
<b>Group C2 :</b> strong support + weak opposition (resolve or vote in the end)<br/>
<b>Group C3 :</b> strong support + strong opposition (resolve or vote in the end)</p>
-Line 34:
+Line 108:
-Remark 2: Some conferences disallow the borderline score of 0, to enforce a clear opininion on the reviewer. In the discussion phase, we indeed ask revievers to commit to one of the other scores. But for the first review, we think it makes sense to allow this score, because it reflects one of the typical sentiments about a paper at this stage of the reviewing process, as expressed by the ''hovering between +1 and −1'' in the table above.
+<p style="color: darkblue">
<b>Group R1 :</b> strong opposition (will probably be rejected)<br/>
<b>Group R2 :</b> weak opposition + no champion (will probably be rejected)</p>
-Line 36:
+Line 112:
-Remark 2: Reviewers might not be fully aware yet of their behavior during the discussion phase for various reasons (for example: not sure about some aspects of the paper, not sure about the nature of the threshold for ESA Track B, general inexperience in reviewing). This can make choosing the right score difficult. But this is exactly one of the tasks of the discussion phase (described below): to bring the final scores (and reviews) closer to what they were supposed to reflect.
+<p style="color: darkblue">
<b>Group X&nbsp;&nbsp; :</b>  two of the reviews are missing or completely lack substance (aquire missing/additional reviews)</p>
}}}

The assignment of a submission to one of these groups will not be done by score alone, but also based on what is written in the reviews. Of course, there will be a strong correlation to the scores. In fact, if the scores were perfect, the correlation would be perfect. But it lies in the nature of the process that some reviewers (and PC members) are unsure about a submission or about the threshold for ESA. So one important part of the discussion phase is to bring the scores closer to what they are intended to reflect.

For example, a submission with scores {2, 2, 2} will probably be in Group A1 (unless the support expressed in the reviews is weaker than it might appear from the scores, in which case Group A2 might be more appropriate), and a submission with only negative scores will probably be in Group R1 (unless the reviews are more positive about the paper than it might appear from the scores, in which case Groups R2 or C1 might be more appropriate).

Submissions can change groups at any time due to the ongoing discussions and corresponding changes in the reviews and/or scores.

The group assignment of a submission can also be challenged by other PC members (who did not write one of the three original reviews for the submission). For example, if another PC member formulates an argument against a submission from Group A2, that submission will go into Groups C2 or C3.

No decision is final until the end of the discussion phase.

== Decision Process (Rounds) ==

After the preparation above (or partly in parallel to it), the discussion will proceed in rounds. Each round lasts several days. In each round, the PC chair will suggest certain submissions for acceptance and others for rejection. In !EasyChair, these submissions will be marked ''accept?'' and ''reject?''. PC members can challenge these suggestions until the next round. In each round that is not the first, submissions that were marked ''accept?'' or ''reject?'' in the previous round and that were not challenged, will be marked ''ACCEPT'' and ''REJECT''. If nobody challenges these decisions anymore, these will become the final decisions for these submissions.

Submissions that have changed groups, will be treated like they would have been treated within that group in a previous round. For example, if for a submission from Group C2 (strong support + weak opposition) the opposition crumbles, the submission moves to Group A2 and will be suggested for ''accept?'' in the next round. Or, if for a submission from Group C1 (weak support + strong opposition) the support crumbles, the submission moves to Group R2 and will be suggested for ''reject?'' in the next round.

{{{#!html
<p style="color: darkblue">
<b>Round 1 :</b> A1 → accept?, R1 → reject?, C1 → push for champion, C2 → challenge opposition, C3 → push for resolution<br/>
<b>Round 2 :</b> A2 → accept?, R2 → reject?, C1 → push for champion, C2 → challenge opposition, C3 → push for resolution<br/>
<b>Round 3 :</b> C1 → reject?, C2 and C3 → like in Round 2<br/>
<b>Round 4 :</b> C2 and C3 → send email to PC with short summary for each of these + call for vote<br/>
<b>Round 5 :</b> Suggestion for final decisions<br/>
<b>Round 6 :</b> Finalize decisions</p>
}}}

There should be as few submissions as possible left in Groups C1, C2, C3 by the end of Round 3. The vote is really just an emergency measure for submissions, where (despite all attempts), no reasonable consensus could be reached.