Differences between revisions 27 and 52 (spanning 25 versions)

This page describes the reviewing "algorithm" for each of the two ESA 2018 Track B PCs. You can go back to the main page here.

Contents

Schedule
Expected Load and Sub-reviewers
Guidelines for the Review Text
Guidelines for the Review Score

Schedule

The total time for the reviewing process (from the submission deadline to the author notification) is 8 weeks. The reviewing process proceeds in the following phases:

1. The deadline for submissions is April 22 AoE (strict)
2. Bidding and paper assignment: 1 week (~ April 23 - April 29)
3. Reviewing: 4 weeks (~ April 30 - May 27)
4. Discussion and recalibration of reviews: 2 weeks (~ May 28 - June 10)
5. Buffer for things going wrong or taking longer than expected: 1 week
6. The notification deadline is June 18 (maybe earlier)

Expected Load and Sub-reviewers

We expect around 50 submissions. Each submission should receive 3 reviews (more reviews are possible, but this is the exception). Each of the two PCs has exactly 12 members. This is an expected load of around 12 submissions per PC member.

We recommend that you review the submissions yourself, but you may ask sub-reviewers for some of the submissions if you prefer to do so. In any case, you should familiarize yourself with each submission assigned to you and its review, so that you can have a competent discussion with the other PC members. The discussion phase is an essential part of the reviewing process.

Guidelines for the Review Text

Each review should provide the following information:

1. A short summary of the main contribution(s) of the submission in the words of the reviewer
2. An itemized list of the strength and weaknesses of the submission
2.1 The strengths should be numbered (S1), (S2), ...
2.2 The weaknesses should be numbered (W1), (W2), ...

Each review also can provide the following information (the authors will thank you):

3. More detailed explanations of the strengths and weaknesses
4. Comments to the authors for improving the paper

You can change the text of your review in the discussion phase. However, the discussion phase (and the whole reviewing process) will not work, if your initial review is not substantial.

Guidelines for the Review Score

Each review should provide one of the following scores. Just like the text of your reviews, these scores are important for the discussion phase. You can change your scores during the discussion phase, but it will greatly help the efficiency and quality of the process, if you hit the "right" score for a paper already in your initial review.

Score	Verdict	Behavior during discussion
+2 (accept)	Good fit and no major weaknesses	I would champion this paper and fight against rejection
+1 (weak accept)	Significant weaknesses, but still acceptable	I would support this paper, but not fight against rejection
0 (borderline)	Hovering between +1 and −1	Not sure yet about the severity of the weaknesses / the threshold for ESA
−1 (weak reject)	Significant weaknesses, lean to reject	I would not support this paper, but not fight against acceptance
−2 (reject)	Bad fit or major weaknesses	I would oppose this paper and fight against acceptance

Remark 1: Some conferences also have +3 (strong accept) and −3 (strong reject). Experience shows that they are of little use for deciding on the set of accepted papers for a moderate number of submissions, as in ESA Track B (around 50).

Remark 2: Some conferences disallow the borderline score of 0, to enforce a clear opinion on the reviewer. In the discussion phase, we indeed ask revievers to commit to one of the other scores. But for the first review, we think it makes sense to allow this score, because it reflects one of the typical sentiments about a paper at this stage of the reviewing process, as expressed by the hovering between +1 and −1 in the table above.

Remark 2: Reviewers might not be fully aware yet of their behavior during the discussion phase for various reasons (for example: not sure about some aspects of the paper, not sure about the nature of the threshold for ESA Track B, general inexperience in reviewing). This can make choosing the right score difficult. It is exactly one of the tasks of the discussion phase to bring the final scores (and reviews) closer to what they are supposed to reflect.

= Discussion Phase =

The discussion phase starts as soon as all the reviews are in. It lasts approximately two weeks; see the schedule above.

We will use a relative standard "algorithm". Though standard and used in many program committees, it is not easy to give a full specification of this algorithm. The basic procedure is clear, but there are many eventualities, most of which will not happen, but some of which will, and it's hard to say in advance which. There is also a fair amount of complex human judgement involved that is hard to formalize. And there are many variations of this standard algorithm.

We try our best anyway. We try to be as specific as possible, without being overly complicated or impractical. The result is not a 100% complete and precise specification of the process. We will fill in the gaps and fix problems in a reasonable way as we go along. As far as the experiment is concerned, these conditions are not perfect, but reasonable given the complexity of the process and the agents involved. That being said, please let me know if you see a way to improve the specification below without making it overly complicated or impractical.

Beginning of the Discussion Phase

At the beginning of the discussion phase, each PC member should do the following (all the discussion and communication happens within EasyChair):

1. Read the reviews from the other reviewers
2. Comment on contrary arguments or ask questions if something is unclear
3. Adapt your review and possibly the score to what you have learned from the discussion
4. If your initial score was 0, change it away from 0 based on what you have learned from the other reviews and from the discussion

Groups of submissions

To specify the decision process, it is useful to categorize submissions into the following Groups. Except for Group X (which hopefully will be empty), the descriptions assume that there are at least three reviews for each submission. The description in parentheses says what is likely to happen to a submission in this group. This will be described in more detail in the next section.

Group A1 : clear support (will probably be accepted)
Group A2 : at least one champion + weak support from the others (good chance to be accepted)

Group C1 : weak support + strong opposition (resolve or vote in the end)
Group C2 : strong support + weak opposition (resolve or vote in the end)
Group C3 : strong support + strong opposition (resolve or vote in the end)

Group R1 : strong opposition (will probably be rejected)
Group R2 : weak opposition + no champion (will probably be rejected)

Group X : two of the reviews are missing or completely lack substance (aquire missing/additional reviews)

The assignment of a submission to one of these groups will not be done by score alone, but also based on what is written in the reviews. Of course, there will be a strong correlation to the scores. In fact, if the scores were perfect, the correlation would be perfect. But it lies in the nature of the process that some reviewers (and PC members) are unsure about a submission or about the threshold for ESA. So one important part of the discussion phase is to bring the scores closer to what they are intended to reflect.

For example, a submission with scores {2, 2, 2} will probably be in Group A1 (unless the support expressed in the reviews is weaker than it might appear from the scores, in which case Group A2 might be more appropriate), and a submission with only negative scores will probably be in Group R1 (unless the reviews are more positive about the paper than it might appear from the scores, in which case Groups R2 or C1 might be more appropriate).

Submissions can change groups at any time due to the ongoing discussions and corresponding changes in the reviews and/or scores.

The group assignment of a submission can also be challenged by other PC members (who did not write one of the three original reviews for the submission). For example, if another PC member formulates an argument against a submission from Group A2, that submission will go into Groups C2 or C3.

No decision is final until the end of the discussion phase.

Decision Process (Rounds)

After the preparation above (or partly in parallel to it), the discussion will proceed in rounds. Each round lasts several days. In each round, the PC chair will suggest certain submissions for acceptance and others for rejection. In EasyChair, these submissions will be marked accept? and reject?. PC members can challenge these suggestions until the next round. In each round that is not the first, submissions that were marked accept? or reject? in the previous round and that were not challenged, will be marked ACCEPT and REJECT. If nobody challenges these decisions anymore, these will become the final decisions for these submissions.

Submissions that that have changed groups, will be treated like they would have been treated with that group in a previous round. For example, if for a submission from Group C2 (strong support + weak opposition) the opposition crumbles, the submission moves to Group A2 and will be suggested for accept? in the next round. Or, if for a submission from Group C1 (weak support + strong opposition) the support crumbles, the submission moves to Group R2 and will be suggested for reject? in the next round.

Round 1 : A1 → accept?, R1 → reject?, C1 → push for champion, C2 → challenge opposition, C3 → push for resolution
Round 2 : A2 → accept?, R2 → reject?, C1 → push for champion, C2 → challenge opposition, C3 → push for resolution
Round 3 : C1 → reject?, C2 and C3 → like in Round 2
Round 4 : C2 and C3 → send email to PC with short summary for each of these + call for vote
Round 5 : Suggestion for final decisions
Round 6 : Finalize decisions

There should be as few submissions as possible left in Groups C1, C2, C3 by the end of Round 3. The vote is really just an emergency measure for submissions, where (despite all attempts), no reasonable consensus could be reached.

-  ⇤ ← Revision 27 as of 2018-04-19 00:22:27 → 
  Size: 4105
  Editor: Hannah Bast
  Comment:
+   ← Revision 52 as of 2018-04-19 23:24:57 → ⇥
  Size: 12048
  Editor: Hannah Bast
  Comment:
-Deletions are marked like this.
+Additions are marked like this.
 Line 1:
-This page is about the reviews for the ESA 2018 Track B PCs. We expect around 50 submissions. Each submission should receive 3 reviews (more reviews are possible, but this is the exception). Each of the two PCs has exactly 12 members. This is an expected load of around 12 submissions per PC member.
+This page describes the reviewing "algorithm" for each of the two ESA 2018 Track B PCs. You can go back to the [[http://ad-wiki.informatik.uni-freiburg.de/research/ESA2018Experiment|main page]] here.
 Line 3:
+<<TableOfContents(2)>>
-Line 4:
+Line 5:
-<<TableOfContents(1)>>
+= Schedule =
-Line 6:
+Line 7:
-= Sub-reviewers =

We recommend that you review the papers yourself, but you may ask sub-reviewers for some of the submissions if you prefer to do so. In any case, you should familiarize yourself with the paper and the review, so that you can have a competent discussion with the other PC members. The [[http://ad-wiki.informatik.uni-freiburg.de/research/ESA2018Experiment/DiscussionPhase|discussion phase]] is an essential part of the reviewing process.

= Review Text =

Each review ''must'' provide the following information:
+The total time for the reviewing process (from the submission deadline to the author notification) is 8 weeks. The reviewing process proceeds in the following phases:
-Line 15:
+Line 10:
+<p style="color: darkblue">
1. The deadline for submissions is April 22 AoE (strict)<br/>
2. Bidding and paper assignment: 1 week (~ April 23 - April 29)<br/>
3. Reviewing: 4 weeks (~ April 30 - May 27)<br/>
4. Discussion and recalibration of reviews: 2 weeks (~ May 28 - June 10)<br/>
5. Buffer for things going wrong or taking longer than expected: 1 week<br/>
6. The notification deadline is June 18 (maybe earlier)</p/
}}}

= Expected Load and Sub-reviewers =

We expect around 50 submissions. Each submission should receive 3 reviews (more reviews are possible, but this is the exception). Each of the two PCs has exactly 12 members. This is an expected load of around 12 submissions per PC member.

We recommend that you review the submissions yourself, but you may ask sub-reviewers for some of the submissions if you prefer to do so. In any case, you should familiarize yourself with each submission assigned to you and its review, so that you can have a competent discussion with the other PC members. The [[http://ad-wiki.informatik.uni-freiburg.de/research/ESA2018Experiment/DiscussionPhase|discussion phase]] is an essential part of the reviewing process.

= Guidelines for the Review Text =

Each review ''should'' provide the following information:

{{{#!html
<!-- <div style="display: inline-block; border: 2px solid #f0f0f0; background-color:#f9f9f9; text-align: left; vertical-align: middle; padding:10px 20px"> -->
-Line 32:
+Line 48:
-= Review Score =
+= Guidelines for the Review Score =
-Line 34:
+Line 50:
-Each review should provide one of the following scores. Just like the text of your reviews, these scores are important for the [[http://ad-wiki.informatik.uni-freiburg.de/research/ESA2018Experiment/DiscussionPhase|discussion phase]]. You can change your scores during the discussion phase, but it will greatly help the efficieny and quality of the process, if you hit the "right" score for a paper already in your initial review.
+Each review should provide one of the following scores. Just like the text of your reviews, these scores are important for the [[http://ad-wiki.informatik.uni-freiburg.de/research/ESA2018Experiment/DiscussionPhase|discussion phase]]. You can change your scores during the discussion phase, but it will greatly help the efficiency and quality of the process, if you hit the "right" score for a paper already in your initial review.
-Line 36:
+Line 52:
-|| '''Score'''      || '''Verdict''' || '''Behavior during discussion''' ||
|| +2 (accept)      || Good fit and no major weaknesses || I would champion this paper and fight against rejection ||
|| +1 (weak accept) || Significant weaknesses, but nothing fatal || I would support this paper, but not fight against rejection ||
|| &nbsp;&nbsp;0 (borderline)   || Hovering between +1 and −1 || Not sure yet about the severity of the weaknesses / the threshold for ESA ||
|| −1 (weak reject) || Significant weaknesses, but nothing fatal || I would not support this paper, but not fight against acceptance ||
|| −2 (reject)      || Bad fit or major weaknesses || I would oppose this paper and fight against acceptance ||
+{{{#!html
<table style="color: darkblue">
<tr><th>Score</th><th>Verdict</th><th>Behavior during discussion</th></tr>
<tr><td>+2 (accept)</td><td>Good fit and no major weaknesses</td><td>I would champion this paper and fight against rejection</td></tr>
<tr><td>+1 (weak accept)</td><td>Significant weaknesses, but still acceptable</td><td>I would support this paper, but not fight against rejection</td></tr>
<tr><td>&nbsp;&nbsp;0 (borderline)</td><td>Hovering between +1 and −1</td><td>Not sure yet about the severity of the weaknesses / the threshold for ESA</td></tr>
<tr><td>−1 (weak reject)</td><td>Significant weaknesses, lean to reject</td><td>I would not support this paper, but not fight against acceptance</td></tr>
<tr><td>−2 (reject)</td><td>Bad fit or major weaknesses</td><td>I would oppose this paper and fight against acceptance</td></tr></table>
}}}
-Line 43:
+Line 62:
-Remark 1: Some conference also have +3 (strong accept) and −3 (strong reject). Experience shows that they are of little use for deciding on the set of accepted papers for a moderate number of submissions, as in ESA Track B (around 50).
+Remark 1: Some conferences also have +3 (strong accept) and −3 (strong reject). Experience shows that they are of little use for deciding on the set of accepted papers for a moderate number of submissions, as in ESA Track B (around 50).
-Line 45:
+Line 64:
-Remark 2: Some conferences disallow the borderline score of 0, to enforce a clear opininion on the reviewer. In the discussion phase, we indeed ask revievers to commit to one of the other scores. But for the first review, we think it makes sense to allow this score, because it reflects one of the typical sentiments about a paper at this stage of the reviewing process, as expressed by the ''hovering between +1 and −1'' in the table above.
+Remark 2: Some conferences disallow the borderline score of 0, to enforce a clear opinion on the reviewer. In the discussion phase, we indeed ask revievers to commit to one of the other scores. But for the first review, we think it makes sense to allow this score, because it reflects one of the typical sentiments about a paper at this stage of the reviewing process, as expressed by the ''hovering between +1 and −1'' in the table above.
-Line 47:
+Line 66:
-Remark 2: Reviewers might not be fully aware yet of their behavior during the discussion phase for various reasons (for example: not sure about some aspects of the paper, not sure about the nature of the threshold for ESA Track B, general inexperience in reviewing). This can make choosing the right score difficult. But this is exactly one of the tasks of the discussion phase (described below): to bring the final scores (and reviews) closer to what they were supposed to reflect.
+Remark 2: Reviewers might not be fully aware yet of their behavior during the discussion phase for various reasons (for example: not sure about some aspects of the paper, not sure about the nature of the threshold for ESA Track B, general inexperience in reviewing). This can make choosing the right score difficult. It is exactly one of the tasks of the [[http://ad-wiki.informatik.uni-freiburg.de/research/ESA2018Experiment/DiscussionPhase|discussion phase]] to bring the final scores (and reviews) closer to what they are supposed to reflect.


= Discussion Phase = 

The discussion phase starts as soon as all the reviews are in. It lasts approximately two weeks; see the [[http://ad-wiki.informatik.uni-freiburg.de/research/ESA2018Experiment/ReviewingAlgorithm|schedule]] above.

We will use a relative standard "algorithm". Though standard and used in many program committees, it is not easy to give a full specification of this algorithm. The basic procedure is clear, but there are many eventualities, most of which will not happen, but some of which will, and it's hard to say in advance which. There is also a fair amount of complex human judgement involved that is hard to formalize. And there are many variations of this standard algorithm.

We try our best anyway. We try to be as specific as possible, without being overly complicated or impractical. The result is not a 100% complete and precise specification of the process. We will fill in the gaps and fix problems in a reasonable way as we go along. As far as the experiment is concerned, these conditions are not perfect, but reasonable given the complexity of the process and the agents involved. That being said, please let [[https://ad.informatik.uni-freiburg.de/staff/bast|me]] know if you see a way to improve the specification below without making it overly complicated or impractical.

== Beginning of the Discussion Phase ==

At the beginning of the discussion phase, each PC member should do the following (all the discussion and communication happens within !EasyChair):

{{{#!html
<p style="color: darkblue">
1. Read the reviews from the other reviewers<br/>
2. Comment on contrary arguments or ask questions if something is unclear<br/>
3. Adapt your review and possibly the score to what you have learned from the discussion<br/>
4. If your initial score was 0, change it away from 0 based on what you have learned from the other reviews and from the discussion</p>
}}}

== Groups of submissions ==

To specify the decision process, it is useful to categorize submissions into the following Groups. Except for Group X (which hopefully will be empty), the descriptions assume that there are at least three reviews for each submission. The description in parentheses says what is likely to happen to a submission in this group. This will be described in more detail in the next section.

{{{#!html
<p style="color: darkblue">
<b>Group A1 :</b> clear support (will probably be accepted)<br/>
<b>Group A2 :</b> at least one champion + weak support from the others (good chance to be accepted)</p>

<p style="color: darkblue">
<b>Group C1 :</b> weak support + strong opposition (resolve or vote in the end)<br/>
<b>Group C2 :</b> strong support + weak opposition (resolve or vote in the end)<br/>
<b>Group C3 :</b> strong support + strong opposition (resolve or vote in the end)</p>

<p style="color: darkblue">
<b>Group R1 :</b> strong opposition (will probably be rejected)<br/>
<b>Group R2 :</b> weak opposition + no champion (will probably be rejected)</p>

<p style="color: darkblue">
<b>Group X&nbsp;&nbsp; :</b>  two of the reviews are missing or completely lack substance (aquire missing/additional reviews)</p>
}}}

The assignment of a submission to one of these groups will not be done by score alone, but also based on what is written in the reviews. Of course, there will be a strong correlation to the scores. In fact, if the scores were perfect, the correlation would be perfect. But it lies in the nature of the process that some reviewers (and PC members) are unsure about a submission or about the threshold for ESA. So one important part of the discussion phase is to bring the scores closer to what they are intended to reflect.

For example, a submission with scores {2, 2, 2} will probably be in Group A1 (unless the support expressed in the reviews is weaker than it might appear from the scores, in which case Group A2 might be more appropriate), and a submission with only negative scores will probably be in Group R1 (unless the reviews are more positive about the paper than it might appear from the scores, in which case Groups R2 or C1 might be more appropriate).

Submissions can change groups at any time due to the ongoing discussions and corresponding changes in the reviews and/or scores.

The group assignment of a submission can also be challenged by other PC members (who did not write one of the three original reviews for the submission). For example, if another PC member formulates an argument against a submission from Group A2, that submission will go into Groups C2 or C3.

No decision is final until the end of the discussion phase.

== Decision Process (Rounds) ==

After the preparation above (or partly in parallel to it), the discussion will proceed in rounds. Each round lasts several days. In each round, the PC chair will suggest certain submissions for acceptance and others for rejection. In !EasyChair, these submissions will be marked ''accept?'' and ''reject?''. PC members can challenge these suggestions until the next round. In each round that is not the first, submissions that were marked ''accept?'' or ''reject?'' in the previous round and that were not challenged, will be marked ''ACCEPT'' and ''REJECT''. If nobody challenges these decisions anymore, these will become the final decisions for these submissions.

Submissions that that have changed groups, will be treated like they would have been treated with that group in a previous round. For example, if for a submission from Group C2 (strong support + weak opposition) the opposition crumbles, the submission moves to Group A2 and will be suggested for ''accept?'' in the next round. Or, if for a submission from Group C1 (weak support + strong opposition) the support crumbles, the submission moves to Group R2 and will be suggested for ''reject?'' in the next round.

{{{#!html
<p style="color: darkblue">
<b>Round 1 :</b> A1 → accept?, R1 → reject?, C1 → push for champion, C2 → challenge opposition, C3 → push for resolution<br/>
<b>Round 2 :</b> A2 → accept?, R2 → reject?, C1 → push for champion, C2 → challenge opposition, C3 → push for resolution<br/>
<b>Round 3 :</b> C1 → reject?, C2 and C3 → like in Round 2<br/>
<b>Round 4 :</b> C2 and C3 → send email to PC with short summary for each of these + call for vote<br/>
<b>Round 5 :</b> Suggestion for final decisions<br/>
<b>Round 6 :</b> Finalize decisions</p>
}}}

There should be as few submissions as possible left in Groups C1, C2, C3 by the end of Round 3. The vote is really just an emergency measure for submissions, where (despite all attempts), no reasonable consensus could be reached.