1392
Comment:
|
5541
|
Deletions are marked like this. | Additions are marked like this. |
Line 1: | Line 1: |
Many of our projects (student projects as well as research projects) run on one of our local machines, for example on ''vulcano.informatik.privat:8400'', but should be accessible from the outside, for example via ''http://aqqu.cs.uni-freiburg.de/proxy-test''. This is non-trivial, if the latter URL is not only a top-level URL, like ''http://aqqu.cs.uni-freiburg.de'' but contains a path, like the ''/proxy-test''. | #acl Natalie Prange:read,write All:read Many of our projects (student projects as well as research projects) run on one of our local machines, for example on ''vulcano.informatik.privat:8400'', but should be accessible from the outside, for example via ''http://aqqu.cs.uni-freiburg.de/proxy-test''. This is non-trivial when the latter URL is not a top-level URL, like ''http://aqqu.cs.uni-freiburg.de'', but contains a path, like the ''/proxy-test''. <<TableOfContents(2)>> |
Line 14: | Line 18: |
RewriteEngine On RewriteRule /proxy-test$ /proxy-test/ [R] |
|
Line 22: | Line 28: |
The effect of the ''!ProxyPass'' line is that when somebody types ''http://aqqu.cs.uni-freiburg.de/proxy-test/<something>'' in the browser, then ''vulcano.informatik.privat:8400'' gets the request ''/<something>''. Note that details like a trailing / or not in the line in the configuration file are important. | The effect of the ''!RewriteRule'' line is that when somebody types ''http://aqqu.cs.uni-freiburg.de/proxy-test'' in the browser (without a trailing slash), the browser will be redirected to ''http://aqqu.cs.uni-freiburg.de/proxy-test/''. The reason for this redirect is that the missing slash is problematic for many servers. If you are curious, see the last section on this page. The effect of the ''!ProxyPass'' line is that when somebody enters ''http://aqqu.cs.uni-freiburg.de/proxy-test/<something>'' in the browser, then ''vulcano.informatik.privat:8400'' gets the request ''/<something>''. Due to the redirect described in the previous paragraph, when somebody enters ''http://aqqu.cs.uni-freiburg.de/proxy-test'' (without a trailing slash), the combined effect of the ''!RewriteRule'' and the ''!ProxyPass'' line is that ''vulcano.informatik.privat:8400'' gets the request ''/''. Note that details like a trailing / or not in the line in the configuration file are important. |
Line 25: | Line 35: |
== Requirements of the web application == It is important that the web application only specifies either relative paths or absolute paths which take the path in the browser URL into account. For example, if somebody entered ''http://aqqu.cs.uni-freiburg.de/proxy-test/'' into the browser and the first thing the web application does is to serve the following html file: {{{ <html> <head> <link href="style.css" rel="stylesheet" type="text/css"> <script src="script.js"></script> </head> <body> <h1>A simple HTML page</h1> </body> </html> }}} Then the browser will send two GET requests with the following URLs: {{{ http://aqqu.cs.uni-freiburg.de/proxy-test/style.css http://aqqu.cs.uni-freiburg.de/proxy-test/script.js }}} This is good because these URLs match the proxy configuration from above. To serve these URLs, Apache will ask the following URLs: {{{ http://vulcano.informatik.privat:8400/style.css http://vulcano.informatik.privat:8400/script.js }}} The resulting GET requests received by ''vulcano.informatik.privat:8400'' will then be as follows. Note that the web application does not get to know the ''proxy-test'' prefix here. {{{ GET /style.css GET /script.js }}} == Experimenting with a proxy without the redirect == Without the Redirect, things get tricky because ''http://aqqu.cs.uni-freiburg.de/proxy-test'' causes a lot of problems because the trailing ''/'' is missing. This section tries to explain what happens. == A simple web server for testing proxies == The following code was testes with Python 3.6.9 and it should work with any Python 3.x {{{ """ Copyright 2020, University of Freiburg Chair of Algorithms and Data Structures. Hannah Bast <bast@cs.uni-freiburg.de> """ import sys import socket index_html = """ <html> <head> <link href="style.css" rel="stylesheet" type="text/css"> <script> var xmlHttp = new XmlHttpRequest() xmlHttp.open("GET", "?query=hello", false) xmlHttp.send(null) console.log(xmlHttp.responseText) </script> </head> <body> <p>Test page returned by simple_server.py</p> <p>Check JS console for style.css</p> </body> </html> if __name__ == "__main__": if len(sys.argv) != 2: print("Usage: python3 simple_server_2.py <port>") sys.exit(1) port = int(sys.argv[1]) s = socket.socket(socket.AF_INET, socket.SOCK_STREAM) s.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1) s.bind(("0.0.0.0", port)) s.listen() while True: print("\x1b[1mWaiting on port %d\x1b[0m ... " % port, end="", flush=True) connection, client_address = s.accept() connection.settimeout(5.0) print("connection from %s" % client_address[0], end="", flush=True) request = connection.recv(8192) # reads only one batch print(", request data is:") print(request.decode("utf-8")) result = index_html.encode("utf-8") headers = "HTTP/1.1 200 OK\r\n" \ "Content-Length: %d\r\n" \ "Content-Type: text/html\r\n" \ "\r\n" % len(result) connection.sendall(headers.encode("utf-8")) connection.sendall(result) connection.close() }}} |
Many of our projects (student projects as well as research projects) run on one of our local machines, for example on vulcano.informatik.privat:8400, but should be accessible from the outside, for example via http://aqqu.cs.uni-freiburg.de/proxy-test. This is non-trivial when the latter URL is not a top-level URL, like http://aqqu.cs.uni-freiburg.de, but contains a path, like the /proxy-test.
Contents
Apache Config
The first step is to find out, which virtual host deals with the domain. In the example above, it's the Apache virtual host configured in filicudi:/etc/apaches2/sites-available/aqqu.conf and it contains (among others) the following lines:
<VirtualHost *:80> ServerName aqqu.informatik.uni-freiburg.de ServerAlias aqqu aqqu.cs.uni-freiburg.de [...] RewriteEngine On RewriteRule /proxy-test$ /proxy-test/ [R] ProxyPass /proxy-test http://vulcano.informatik.privat:8400 ProxyPassReverse /proxy-test http://vulcano.informatik.privat:8400 [...] </VirtualHost>
The effect of the RewriteRule line is that when somebody types http://aqqu.cs.uni-freiburg.de/proxy-test in the browser (without a trailing slash), the browser will be redirected to http://aqqu.cs.uni-freiburg.de/proxy-test/. The reason for this redirect is that the missing slash is problematic for many servers. If you are curious, see the last section on this page.
The effect of the ProxyPass line is that when somebody enters http://aqqu.cs.uni-freiburg.de/proxy-test/<something> in the browser, then vulcano.informatik.privat:8400 gets the request /<something>. Due to the redirect described in the previous paragraph, when somebody enters http://aqqu.cs.uni-freiburg.de/proxy-test (without a trailing slash), the combined effect of the RewriteRule and the ProxyPass line is that vulcano.informatik.privat:8400 gets the request /.
Note that details like a trailing / or not in the line in the configuration file are important.
TODO: explain the effect of the ProxyPassReverse line and why and when it is needed.
Requirements of the web application
It is important that the web application only specifies either relative paths or absolute paths which take the path in the browser URL into account. For example, if somebody entered http://aqqu.cs.uni-freiburg.de/proxy-test/ into the browser and the first thing the web application does is to serve the following html file:
<html> <head> <link href="style.css" rel="stylesheet" type="text/css"> <script src="script.js"></script> </head> <body> <h1>A simple HTML page</h1> </body> </html>
Then the browser will send two GET requests with the following URLs:
http://aqqu.cs.uni-freiburg.de/proxy-test/style.css http://aqqu.cs.uni-freiburg.de/proxy-test/script.js
This is good because these URLs match the proxy configuration from above. To serve these URLs, Apache will ask the following URLs:
http://vulcano.informatik.privat:8400/style.css http://vulcano.informatik.privat:8400/script.js
The resulting GET requests received by vulcano.informatik.privat:8400 will then be as follows. Note that the web application does not get to know the proxy-test prefix here.
GET /style.css GET /script.js
Experimenting with a proxy without the redirect
Without the Redirect, things get tricky because http://aqqu.cs.uni-freiburg.de/proxy-test causes a lot of problems because the trailing / is missing. This section tries to explain what happens.
A simple web server for testing proxies
The following code was testes with Python 3.6.9 and it should work with any Python 3.x
""" Copyright 2020, University of Freiburg Chair of Algorithms and Data Structures. Hannah Bast <bast@cs.uni-freiburg.de> """ import sys import socket index_html = """ <html> <head> <link href="style.css" rel="stylesheet" type="text/css"> <script> var xmlHttp = new XmlHttpRequest() xmlHttp.open("GET", "?query=hello", false) xmlHttp.send(null) console.log(xmlHttp.responseText) </script> </head> <body> <p>Test page returned by simple_server.py</p> <p>Check JS console for style.css</p> </body> </html> if __name__ == "__main__": if len(sys.argv) != 2: print("Usage: python3 simple_server_2.py <port>") sys.exit(1) port = int(sys.argv[1]) s = socket.socket(socket.AF_INET, socket.SOCK_STREAM) s.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1) s.bind(("0.0.0.0", port)) s.listen() while True: print("\x1b[1mWaiting on port %d\x1b[0m ... " % port, end="", flush=True) connection, client_address = s.accept() connection.settimeout(5.0) print("connection from %s" % client_address[0], end="", flush=True) request = connection.recv(8192) # reads only one batch print(", request data is:") print(request.decode("utf-8")) result = index_html.encode("utf-8") headers = "HTTP/1.1 200 OK\r\n" \ "Content-Length: %d\r\n" \ "Content-Type: text/html\r\n" \ "\r\n" % len(result) connection.sendall(headers.encode("utf-8")) connection.sendall(result) connection.close()