5544
Comment:
|
8829
|
Deletions are marked like this. | Additions are marked like this. |
Line 3: | Line 3: |
Many of our projects (student projects as well as research projects) run on one of our local machines, for example on ''vulcano.informatik.privat:8400'', but should be accessible from the outside, for example via ''http://aqqu.cs.uni-freiburg.de/proxy-test''. This is non-trivial, if the latter URL is not only a top-level URL, like ''http://aqqu.cs.uni-freiburg.de'' but contains a path, like the ''/proxy-test''. | Many of our projects (student projects as well as research projects) run on one of our local machines, for example on ''vulcano.informatik.privat:8400'', but should be accessible from the outside, for example via ''http://aqqu.cs.uni-freiburg.de/proxy-test''. This is non-trivial when the latter URL is not a top-level URL, like ''http://aqqu.cs.uni-freiburg.de'', but contains a path, like the ''/proxy-test''. |
Line 28: | Line 28: |
The effect of the ''!RewriteRule'' line is that when somebody types ''http://aqqu.cs.uni-freiburg.de/proxy-test'' in the browser (without a trailing slash), the browser will be redirected to ''http://aqqu.cs.uni-freiburg.de/proxy-test/''. The reason for this redirect is that the missing slash is problematic for many servers. If you are curious, see the last section on this page. | The effect of the ''!RewriteRule'' line is that when somebody types ''http://aqqu.cs.uni-freiburg.de/proxy-test'' in the browser (without a trailing slash), the browser will be redirected to ''http://aqqu.cs.uni-freiburg.de/proxy-test/''. The reason for this redirect is that the missing slash is problematic for many web applications. If you are curious, see the last section on this page. In any case, remember that details like a trailing / or not in the line in the configuration file are very important. |
Line 30: | Line 30: |
The effect of the ''!ProxyPass'' line is that when somebody enters ''http://aqqu.cs.uni-freiburg.de/proxy-test/<something>'' in the browser, then ''vulcano.informatik.privat:8400'' gets the request ''/<something>''. Due to the redirect described in the previous paragraph, when somebody enters ''http://aqqu.cs.uni-freiburg.de/proxy-test'' (without a trailing slash), the combined effect of the ''!RewriteRule'' and the ''!ProxyPass'' line is that ''vulcano.informatik.privat:8400'' gets the request ''/''. | The effect of the ''!ProxyPass'' line is that when somebody enters ''http://aqqu.cs.uni-freiburg.de/proxy-test/<something>'' in the browser, then ''vulcano.informatik.privat:8400'' gets the request ''/<something>''. Due to the redirect described in the previous paragraph, when somebody enters ''http://aqqu.cs.uni-freiburg.de/proxy-test'' (without a trailing slash), the combined effect of the ''!RewriteRule'' and the ''!ProxyPass'' line is that ''vulcano.informatik.privat:8400'' gets the following GET request: |
Line 32: | Line 32: |
Note that details like a trailing / or not in the line in the configuration file are important. | {{{ GET / HTTP/1.1 ... }}} |
Line 34: | Line 36: |
TODO: explain the effect of the ''!ProxyPassReverse'' line and why and when it is needed. | The ''!ProxyPassReverse'' line is needed when the server at ''vulcano.informatik.privat:8400'' does further redirects. For a detailed exaplanation, see https://serverfault.com/questions/774041/what-is-the-use-of-proxypassreverse-directive . In any case, the line doesn't harm, so we always write it in the Apache configuration file. It's also easy because it is identical to the ''!ProxyPass'' line, except that ''Reverse'' is added. |
Line 38: | Line 40: |
It is important that the web application only specifies either relative paths or absolute paths which take the path in the browser URL into account. For example, if somebody entered ''http://aqqu.cs.uni-freiburg.de/proxy-test/'' into the browser and the first thing the web application does is to serve the following html file: | It is important that the web application only specifies either relative paths or absolute paths which take the path in the browser URL into account. For example, assume that somebody entered ''http://aqqu.cs.uni-freiburg.de/proxy-test/'' into the browser and the first thing the web application does is to serve the following html file: |
Line 66: | Line 68: |
The resulting GET requests received by ''vulcano.informatik.privat:8400'' will then be as follows. Note that the web application does not get to know the ''proxy-test'' prefix here. | The first line of the resulting GET requests received by ''vulcano.informatik.privat:8400'' will then be as follows. |
Line 69: | Line 71: |
GET /style.css GET /script.js |
GET /style.css HTTP/1.1 GET /script.js HTTP/1.1 |
Line 73: | Line 75: |
Note that the first line of the GET request does not contain the ''proxy-test'' prefix. However, this information is given in one of the header lines that follow the first line, namely the ''Referer: ...'' line: | |
Line 74: | Line 77: |
{{{ Host: vulcano.informatik.privat:8400 Pragma: no-cache Cache-Control: no-cache User-Agent: Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.88 Safari/537.36 Accept: */* Referer: http://aqqu.cs.uni-freiburg.de/proxy-test Accept-Encoding: gzip, deflate Accept-Language: en-US,en;q=0.9,de-DE;q=0.8,de;q=0.7 Cookie: _ga=GA1.2.1897526143.1483962283; ioam2018=001a120e44815d0655dc1c0e9:1599244525817:1572978925817:.uni-freiburg.de:10:ak025:dbs:noevent:1576359561488:y4r1b0 X-Forwarded-For: 132.230.239.31 X-Forwarded-Host: aqqu.cs.uni-freiburg.de X-Forwarded-Server: aqqu.informatik.uni-freiburg.de Connection: Keep-Alive }}} |
|
Line 80: | Line 96: |
Without the Redirect, things get tricky because ''http://aqqu.cs.uni-freiburg.de/proxy-test'' causes a lot of problems because the trailing ''/'' is missing. This section tries to explain what happens. | Without the ''Redirect'' directive in the Apache configuration above, things can get tricky when a user enters ''http://aqqu.cs.uni-freiburg.de/proxy-test'' into the browser. This section tries to explain what happens. |
Line 82: | Line 98: |
Below is a simple web server (written in Python3), which serves an ''index.html'', which in turn loads a file with the (relative) URL ''style.css'' and executes some !JavaScript that issues a GET request with the (relative) URL ''?query=hello''. When entering ''http://aqqu.cs.uni-freiburg.de/proxy-test'' into the browser, the following URLs are retrieved (see the network tab of the !JavaScript / developer console, which can be activated with F11 or Ctrl+Shift+J): {{{ http://aqqu.cs.uni-freiburg.de/proxy-test http://aqqu.cs.uni-freiburg.de/style.css http://aqqu.cs.uni-freiburg.de/proxy-test?query=hello }}} When entering ''http://aqqu.cs.uni-freiburg.de/proxy-test/'' into the browser, that is, '''with''' a trailing slash, the following URLs are retrieved: {{{ http://aqqu.cs.uni-freiburg.de/proxy-test/ http://aqqu.cs.uni-freiburg.de/proxy-test/style.css http://aqqu.cs.uni-freiburg.de/proxy-test/?query=hello }}} The ''?query=hello'' request can be handled, although it does require some extra care on the side of the web application running behind ''vulcano.informatik.privat:8400'' to deal with the fact that there might or might not be a slash before the ''?query=hello''. The ''style.css'' is more problematic. To be able to handle both cases, the web application must detect whether the prefix ''proxy-test'' is there, and if yes, remove it for this and all further requests to the app. This is complicated by the fact that at first, the web app does not know the name of the prefix. However, it can retrieve this information form the very first request it gets and then make use of it for all further requests. So it's possible, but a bit messy and it does require extra care on the side of the web application. It's certainly easier to just add the ''Redirect'' to the Apache configuration (see above). Then the web application still has to take care that all the URLs it produces are relative (or absolute with the right path), but that is relatively easy. |
|
Line 85: | Line 124: |
The following code was testes with Python 3.6.9 and it should work with any Python 3.x | The following code was tested with Python 3.6.9 and it should work with any Python 3.x |
Line 116: | Line 155: |
print("Usage: python3 simple_server_2.py <port>") | print("Usage: python3 simple_server.py <port>") |
Many of our projects (student projects as well as research projects) run on one of our local machines, for example on vulcano.informatik.privat:8400, but should be accessible from the outside, for example via http://aqqu.cs.uni-freiburg.de/proxy-test. This is non-trivial when the latter URL is not a top-level URL, like http://aqqu.cs.uni-freiburg.de, but contains a path, like the /proxy-test.
Contents
Apache Config
The first step is to find out, which virtual host deals with the domain. In the example above, it's the Apache virtual host configured in filicudi:/etc/apaches2/sites-available/aqqu.conf and it contains (among others) the following lines:
<VirtualHost *:80> ServerName aqqu.informatik.uni-freiburg.de ServerAlias aqqu aqqu.cs.uni-freiburg.de [...] RewriteEngine On RewriteRule /proxy-test$ /proxy-test/ [R] ProxyPass /proxy-test http://vulcano.informatik.privat:8400 ProxyPassReverse /proxy-test http://vulcano.informatik.privat:8400 [...] </VirtualHost>
The effect of the RewriteRule line is that when somebody types http://aqqu.cs.uni-freiburg.de/proxy-test in the browser (without a trailing slash), the browser will be redirected to http://aqqu.cs.uni-freiburg.de/proxy-test/. The reason for this redirect is that the missing slash is problematic for many web applications. If you are curious, see the last section on this page. In any case, remember that details like a trailing / or not in the line in the configuration file are very important.
The effect of the ProxyPass line is that when somebody enters http://aqqu.cs.uni-freiburg.de/proxy-test/<something> in the browser, then vulcano.informatik.privat:8400 gets the request /<something>. Due to the redirect described in the previous paragraph, when somebody enters http://aqqu.cs.uni-freiburg.de/proxy-test (without a trailing slash), the combined effect of the RewriteRule and the ProxyPass line is that vulcano.informatik.privat:8400 gets the following GET request:
GET / HTTP/1.1 ...
The ProxyPassReverse line is needed when the server at vulcano.informatik.privat:8400 does further redirects. For a detailed exaplanation, see https://serverfault.com/questions/774041/what-is-the-use-of-proxypassreverse-directive . In any case, the line doesn't harm, so we always write it in the Apache configuration file. It's also easy because it is identical to the ProxyPass line, except that Reverse is added.
Requirements of the web application
It is important that the web application only specifies either relative paths or absolute paths which take the path in the browser URL into account. For example, assume that somebody entered http://aqqu.cs.uni-freiburg.de/proxy-test/ into the browser and the first thing the web application does is to serve the following html file:
<html> <head> <link href="style.css" rel="stylesheet" type="text/css"> <script src="script.js"></script> </head> <body> <h1>A simple HTML page</h1> </body> </html>
Then the browser will send two GET requests with the following URLs:
http://aqqu.cs.uni-freiburg.de/proxy-test/style.css http://aqqu.cs.uni-freiburg.de/proxy-test/script.js
This is good because these URLs match the proxy configuration from above. To serve these URLs, Apache will ask the following URLs:
http://vulcano.informatik.privat:8400/style.css http://vulcano.informatik.privat:8400/script.js
The first line of the resulting GET requests received by vulcano.informatik.privat:8400 will then be as follows.
GET /style.css HTTP/1.1 GET /script.js HTTP/1.1
Note that the first line of the GET request does not contain the proxy-test prefix. However, this information is given in one of the header lines that follow the first line, namely the Referer: ... line:
Host: vulcano.informatik.privat:8400 Pragma: no-cache Cache-Control: no-cache User-Agent: Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.88 Safari/537.36 Accept: */* Referer: http://aqqu.cs.uni-freiburg.de/proxy-test Accept-Encoding: gzip, deflate Accept-Language: en-US,en;q=0.9,de-DE;q=0.8,de;q=0.7 Cookie: _ga=GA1.2.1897526143.1483962283; ioam2018=001a120e44815d0655dc1c0e9:1599244525817:1572978925817:.uni-freiburg.de:10:ak025:dbs:noevent:1576359561488:y4r1b0 X-Forwarded-For: 132.230.239.31 X-Forwarded-Host: aqqu.cs.uni-freiburg.de X-Forwarded-Server: aqqu.informatik.uni-freiburg.de Connection: Keep-Alive
Experimenting with a proxy without the redirect
Without the Redirect directive in the Apache configuration above, things can get tricky when a user enters http://aqqu.cs.uni-freiburg.de/proxy-test into the browser. This section tries to explain what happens.
Below is a simple web server (written in Python3), which serves an index.html, which in turn loads a file with the (relative) URL style.css and executes some JavaScript that issues a GET request with the (relative) URL ?query=hello.
When entering http://aqqu.cs.uni-freiburg.de/proxy-test into the browser, the following URLs are retrieved (see the network tab of the JavaScript / developer console, which can be activated with F11 or Ctrl+Shift+J):
http://aqqu.cs.uni-freiburg.de/proxy-test http://aqqu.cs.uni-freiburg.de/style.css http://aqqu.cs.uni-freiburg.de/proxy-test?query=hello
When entering http://aqqu.cs.uni-freiburg.de/proxy-test/ into the browser, that is, with a trailing slash, the following URLs are retrieved:
http://aqqu.cs.uni-freiburg.de/proxy-test/ http://aqqu.cs.uni-freiburg.de/proxy-test/style.css http://aqqu.cs.uni-freiburg.de/proxy-test/?query=hello
The ?query=hello request can be handled, although it does require some extra care on the side of the web application running behind vulcano.informatik.privat:8400 to deal with the fact that there might or might not be a slash before the ?query=hello.
The style.css is more problematic. To be able to handle both cases, the web application must detect whether the prefix proxy-test is there, and if yes, remove it for this and all further requests to the app. This is complicated by the fact that at first, the web app does not know the name of the prefix. However, it can retrieve this information form the very first request it gets and then make use of it for all further requests.
So it's possible, but a bit messy and it does require extra care on the side of the web application. It's certainly easier to just add the Redirect to the Apache configuration (see above). Then the web application still has to take care that all the URLs it produces are relative (or absolute with the right path), but that is relatively easy.
A simple web server for testing proxies
The following code was tested with Python 3.6.9 and it should work with any Python 3.x
""" Copyright 2020, University of Freiburg Chair of Algorithms and Data Structures. Hannah Bast <bast@cs.uni-freiburg.de> """ import sys import socket index_html = """ <html> <head> <link href="style.css" rel="stylesheet" type="text/css"> <script> var xmlHttp = new XmlHttpRequest() xmlHttp.open("GET", "?query=hello", false) xmlHttp.send(null) console.log(xmlHttp.responseText) </script> </head> <body> <p>Test page returned by simple_server.py</p> <p>Check JS console for style.css</p> </body> </html> if __name__ == "__main__": if len(sys.argv) != 2: print("Usage: python3 simple_server.py <port>") sys.exit(1) port = int(sys.argv[1]) s = socket.socket(socket.AF_INET, socket.SOCK_STREAM) s.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1) s.bind(("0.0.0.0", port)) s.listen() while True: print("\x1b[1mWaiting on port %d\x1b[0m ... " % port, end="", flush=True) connection, client_address = s.accept() connection.settimeout(5.0) print("connection from %s" % client_address[0], end="", flush=True) request = connection.recv(8192) # reads only one batch print(", request data is:") print(request.decode("utf-8")) result = index_html.encode("utf-8") headers = "HTTP/1.1 200 OK\r\n" \ "Content-Length: %d\r\n" \ "Content-Type: text/html\r\n" \ "\r\n" % len(result) connection.sendall(headers.encode("utf-8")) connection.sendall(result) connection.close()