Decoding URL encoding (percent encoding)

I want to decode URL encoding, is there any built-in tool for doing this or could anyone provide me with a sed code that will do this?

I did search a bit through unix.stackexchange.com and on the internet but I couldn’t find any command line tool for decoding url encoding.

What I want to do is simply in place edit a txt file so that:

  • %21 becomes !
  • %23 becomes #
  • %24 becomes $
  • %26 becomes &
  • %27 becomes '
  • %28 becomes (
  • %29 becomes )

And so on.

Asked By: DisplayName

||

Found these Python one liners that do what you want:

Python2

$ alias urldecode='python -c "import sys, urllib as ul; 
    print ul.unquote_plus(sys.argv[1])"'

$ alias urlencode='python -c "import sys, urllib as ul; 
    print ul.quote_plus(sys.argv[1])"'

Python3

$ alias urldecode='python3 -c "import sys, urllib.parse as ul; 
    print(ul.unquote_plus(sys.argv[1]))"'

$ alias urlencode='python3 -c "import sys, urllib.parse as ul; 
    print (ul.quote_plus(sys.argv[1]))"'

Example

$ urldecode 'q+werty%3D%2F%3B'
q werty=/;

$ urlencode 'q werty=/;'
q+werty%3D%2F%3B

References

Answered By: slm

There is a built-in function for that in the Python standard library. In Python 2, it’s urllib.unquote.

decoded_url=$(python2 -c 'import sys, urllib; print urllib.unquote(sys.argv[1])' "$encoded_url")

Or to process a file:

python2 -c 'import sys, urllib; print urllib.unquote(sys.stdin.read())' <file >file.new &&
mv -f file.new file

In Python 3, it’s urllib.parse.unquote.

decoded_url=$(python3 -c 'import sys, urllib.parse; print(urllib.parse.unquote(sys.argv[1]))' "$encoded_url")

Or to process a file:

python3 -c 'import sys, urllib.parse; print(urllib.parse.unquote(sys.stdin.read()))' <file >file.new &&
mv -f file.new file

In Perl you can use URI::Escape.

decoded_url=$(perl -MURI::Escape -e 'print uri_unescape($ARGV[0])' "$encoded_url")

Or to process a file:

perl -pli -MURI::Escape -e '$_ = uri_unescape($_)' file

If you want to stick to POSIX portable tools, it’s awkward, because the only serious candidate is awk, which doesn’t parse hexadecimal numbers. See Using awk printf to urldecode text for examples with common awk implementations, including BusyBox.

And another Perl approach:

#!/usr/bin/env perl
use URI::Encode;
my $uri     = URI::Encode->new( { encode_reserved => 0 } );
while (<>) {

    print $uri->decode($_)
}

You will need to install the URI::Encode module. On my Debian, I could simply run

sudo apt-get install liburi-encode-perl

Then, I ran the script above on a test file containing:

http://foo%21asd%23asd%24%26asd%27asd%28asd%29

The result was (I had saved the script as foo.pl):

$ ./foo.pl
http://foo!asd#asd$&asd'asd(asd)
Answered By: terdon

If you want to use a simple-minded sed command, then use the following:

sed -e 's/%21/!/g' -e 's/%23/#/g' -e 's/%24/$/g' -e 's/%26/&/g' -e "s/%27/'/g" -e 's/%28/(/g' -e 's/%29/)/g'

But it is more convenient to create a script like (say sedscript):

s/%21/!/g
s/%23/#/g
s/%24/$/g
s/%26/&/g
s/%27/'/g
s/%28/(/g
s/%29/)/g

Then run sed -f sedscript < old > new, which will output as you desired.


For an ease, the command urlencode is also available directly in gridsite-clients package can be installed from (by sudo apt-get install gridsite-clients in Ubuntu/Debian system).

NAME

    urlencode – convert strings to or from URL-encoded form

SYNOPSIS

    urlencode [-m|-d] string [string ...]

DESCRIPTION

    urlencode encodes strings according to RFC 1738.

    That is, characters AZ az 09 . _ and - are
    passed through unmodified, but all other characters are represented as %HH,
    where HH is their two-digit upper-case hexadecimal ASCII representation.
    For example, the URL http://www.gridpp.ac.uk/ becomes http%3A%2F%2Fwww.gridpp.ac.uk%2F

    urlencode converts each character in all the strings
    given on the command line.  If multiple strings are given,
    they are concatenated with separating spaces before conversion.

OPTIONS

    -m

      Instead of full conversion, do GridSite “mild URL encoding”
      in which A-Z a-z 0-9 . = – _ @ and / are passed through unmodified. 
      This results in slightly more human-readable strings
      but the application must be prepared to create or simulate
      the directories implied by any slashes.

    -d

      Do URL-decoding rather than encoding, according to RFC 1738. 
      %HH and %hh strings are converted and other characters are passed through
      unmodified, with the exception that + is converted to space.

Example of decoding URL:

$ urlencode -d "http%3a%2f%2funix.stackexchange.com%2f"
http://unix.stackexchange.com/

$ urlencode -d "Example: %21, %22, . . . , %29 etc"
Example: !, ", . . . , ) etc
Answered By: Pandya

Perl one liner:

$ perl -pe 's/%(ww)/chr hex $1/ge'

Example:

$ echo '%21%22' |  perl -pe 's/%(ww)/chr hex $1/ge'
!"

or if you want to ignore non-hex sequences like %zz (which the above mangles)

$ perl -pe 's/%([[:xdigit:]]{2})/chr hex $1/ge'
Answered By: Adrian Pronk

GNU Awk

#!/usr/bin/awk -fn
@include "ord"
BEGIN {
   RS = "%.."
}
{
   printf "%s", $0
   if (RT != "") {
      printf "%s", chr("0x" substr(RT, 2)) 
   }
}
Answered By: Zombo

Shell-only:

$ x='a%20%25%e3%81%82';printf "${x//%/\x}"
a %あ

Add -- or %b to prevent arguments that start with a dash from being treated as options.

In zsh ${x//%/a} adds a to the end but ${x//%/a} replaces % with a.

Answered By: Lri

An answer in (mostly Posix) shell:

$ input='%21%22'
$ printf "`printf "%sn" "$input" | sed -e 's/+/ /g' -e 's/%(..)/\\x1/g'`"
!"

Explanation:

  • -e 's/+/ /g transforms each + in space (as described in url-encode norm)
  • -e 's/%(..)/\\x1/g' transform each %XX in \xXX. Notice one of will be removed by quoting rules.
  • The inner printf is just there to pass input to sed. We may replace it by any other mechanism
  • The outer printf interpret \xXX sequences and display result.

Edit:

Since % should always been interpreted in URLs, it is possible to simplify this answer. In add, I think it is cleaner to use xargs instead of backquotes (thanks to @josch).

$ input='%21%22+%25'
$ printf "%sn" "$input" | sed -e 's/+/ /g; s/%/\x/g' | xargs -0 printf
!" %

Unfortunately, (as @josch noticed) none of these solutions are Posix compliant since x escape sequence is not defined in Posix.

Answered By: Jérôme Pouiller

Here are the relevant bits from another script (that I just shamelessly stole from my youtube.com download script from another answer) I’ve written before. It uses sed and the shell to build up a working urldecode.

set ! " # $ % & ' ( ) *  + , / : ; = ? @ [ ]
for c do set "$@" "'$c" "$c"; shift; done
curl -s "$url" | sed 's/\u0026/&/g;'"$(
    printf 's/%%%X/\%s/g;' "$@"
)"

I wont swear it’s comprehensive – and in fact I doubt it – but it handled youtube surely enough.

Answered By: mikeserv

sed

Try the following command line:

$ sed 's@+@ @g;s@%@\x@g' file | xargs -0 printf "%b"

or the following alternative using echo -e:

$ sed -e's/%([0-9A-F][0-9A-F])/\\x1/g' file | xargs echo -e

Note: The above syntax may not convert + to spaces, and can eat all the newlines.


You may define it as alias and add it to your shell rc files:

$ alias urldecode='sed "s@+@ @g;s@%@\\x@g" | xargs -0 printf "%b"'

Then every time when you need it, simply go with:

$ echo "http%3A%2F%2Fwww" | urldecode
http://www

Bash

When scripting, you can use the following syntax:

input="http%3A%2F%2Fwww"
decoded=$(printf '%b' "${input//%/\x}")

However above syntax won’t handle pluses (+) correctly, so you’ve to replace them with spaces via sed or as suggested by @isaac, use the following syntax:

decoded=$(input=${input//+/ }; printf "${input//%/\x}")

You can also use the following urlencode() and urldecode() functions:

urlencode() {
    # urlencode <string>
    local length="${#1}"
    for (( i = 0; i < length; i++ )); do
        local c="${1:i:1}"
        case $c in
            [a-zA-Z0-9.~_-]) printf "$c" ;;
            *) printf '%%%02X' "'$c" ;;
        esac
    done
}
 
urldecode() {
    # urldecode <string>
 
    local url_encoded="${1//+/ }"
    printf '%b' "${url_encoded//%/\x}"
}

Note that above urldecode() assumes the data contains no backslash.

Here is similar Joel’s version found at: https://github.com/sixarm/urldecode.sh


bash + xxd

Bash function with xxd tool:

urlencode() {
  local length="${#1}"
  for (( i = 0; i < length; i++ )); do
    local c="${1:i:1}"
    case $c in
      [a-zA-Z0-9.~_-]) printf "$c" ;;
    *) printf "$c" | xxd -p -c1 | while read x;do printf "%%%s" "$x";done
  esac
done
}

Found in cdown’s gist file, also at stackoverflow.


PHP

Using PHP you can try the following command:

$ echo oil+and+gas | php -r 'echo urldecode(fgets(STDIN));' // Or: php://stdin
oil and gas

or just:

php -r 'echo urldecode("oil+and+gas");'

Use -R for multiple line input.


Perl

In Perl you can use URI::Escape.

decoded_url=$(perl -MURI::Escape -e 'print uri_unescape($ARGV[0])' "$encoded_url")

Or to process a file:

perl -i -MURI::Escape -e 'print uri_unescape($ARGV[0])' file

awk

Try anon solution:

awk -niord '{printf RT?$0chr("0x"substr(RT,2)):$0}' RS=%..

Note: Parameter -n is specific to GNU awk.

Try Stéphane Chazelas urlencode solution:

awk -v RS='&#[0-9]+;' -v ORS= '1;RT{printf("%%%02X", substr(RT,3))}'

See: Using awk printf to urldecode text.

decoding file names

If you need to remove url encoding from the file names, use deurlname tool from renameutils (e.g. deurlname *.*).

See also:


Related:

Answered By: kenorb

I can’t comment on best answer in this thread, so here is mine.

Personally, I use these aliases for URL encoding and decoding:

alias urlencode='python -c "import urllib, sys; print urllib.quote(  sys.argv[1] if len(sys.argv) > 1 else sys.stdin.read()[0:-1])"'

alias urldecode='python -c "import urllib, sys; print urllib.unquote(sys.argv[1] if len(sys.argv) > 1 else sys.stdin.read()[0:-1])"'

Both commands allow you to convert data, passed as a command line argument or read it from standard input, because both one-liners check whether there are command line arguments (even empty ones) and process them or just read standard input otherwise.


update 2017-05-23 (slash encoding)

In response to the @Bevor’s comment.

If you also need to encode the slash, just add an empty second argument to the quote function, then the slash will also be encoded.

So, finally urlencode alias in bash looks like this:

alias urlencode='python -c "import urllib, sys; print urllib.quote(sys.argv[1] if len(sys.argv) > 1 else sys.stdin.read()[0:-1], "")"'

Example

$ urlencode "Проба пера/Pen test"
%D0%9F%D1%80%D0%BE%D0%B1%D0%B0%20%D0%BF%D0%B5%D1%80%D0%B0%2FPen%20test

$ echo "Проба пера/Pen test" | urlencode
%D0%9F%D1%80%D0%BE%D0%B1%D0%B0%20%D0%BF%D0%B5%D1%80%D0%B0%2FPen%20test

$ urldecode %D0%9F%D1%80%D0%BE%D0%B1%D0%B0%20%D0%BF%D0%B5%D1%80%D0%B0%2FPen%20test
Проба пера/Pen test

$ echo "%D0%9F%D1%80%D0%BE%D0%B1%D0%B0%20%D0%BF%D0%B5%D1%80%D0%B0%2FPen%20test" | urldecode
Проба пера/Pen test

$ urlencode "Проба пера/Pen test" | urldecode
Проба пера/Pen test

$ echo "Проба пера/Pen test" | urlencode | urldecode
Проба пера/Pen test
Answered By: DIG mbl

Here is a BASH function to do exactly that:

function urldecode() {
        echo -ne $(echo -n "$1" | sed -E "s/%/\\x/g")
}
Answered By: Adi D

Another solution using ruby (accepted python answer wasn’t working for me)

alias urldecode='ruby -e "require "cgi"; puts CGI.unescape(ARGV[0])"'
alias urlencode='ruby -e "require "cgi"; puts CGI.escape(ARGV[0])"'

Example

$ urldecode 'q+werty%3D%2F%3B'
q werty=/;

$ urlencode 'q werty=/;'
q+werty%3D%2F%3B
Answered By: Shiyason

The simple solution for short strings (shell is slowwww):

$ str='q+werty%3D%2F%3B'

$ a=${str//+/ };printf "$(echo "${a//%/\x}")n"

q werty=/;
Answered By: user232326

From my laymen research of the topic, it appears that the implementations of the percent-encoding are susceptible to ambiguity in edge cases, such as character encoding potentially being different than expected, characters not escaped, query part being encoded differently, potential presence of binary and non-ASCII characters, etc. So, some analysis of and assumptions about the input data are necessary.

The closest to a dedicated tool are respective functions in programming languages, such as Python’s functions from urllib module, which makes some sane assumptions about the URL data, as evidenced by the comments in cpython’s code. That’s why I find the current top answer being good.

As a matter of exercise, I implemented a similar alias with GNU Guile, since it is in path by default on a GNU Guix system with Python not necessarily being present in path. I cannot comment on reliability in comparison to Python, Perl, or other solutions. The documentation suggests that one should preferably split the URL on ?, &, and =, and process the query separately from the path, as well as split the path into segments with a dedicated function, and still be ready for errors. However, I am satisfied with the results on full URL strings copied from a browser.

alias urldecode='guile -c "(use-modules (web uri))
                           (display (uri-decode (cadr (command-line))))
                           (newline)"'

(web uri) module provides uri-decode function for decoding URIs. command-line passes the arguments. cadr picks the second item in the list (which is the URL being the first argument after the executable name itself, i.e. guile).

$ urldecode "http://ephsheir.uhsp.edu.ua/bitstream/handle/8989898989/2850/%d0%9c%d0%b0%d0%ba%d0%b5%d1%82%20%d0%9d%d0%b0%d1%80%d0%be%d0%b4%d0%bd%d0%b8%20%d0%bd%d0%b0%d0%b7%d0%b2%d0%b8.pdf?sequence=2&isAllowed=y"
http://ephsheir.uhsp.edu.ua/bitstream/handle/8989898989/2850/Макет Народни назви.pdf?sequence=2&isAllowed=y

A one-liner when not having an alias:

$ guile -c "(use-modules (web uri)) (display (uri-decode (cadr (command-line)))) (newline)" "http://ephsheir.uhsp.edu.ua/bitstream/handle/8989898989/2850/%d0%9c%d0%b0%d0%ba%d0%b5%d1%82%20%d0%9d%d0%b0%d1%80%d0%be%d0%b4%d0%bd%d0%b8%20%d0%bd%d0%b0%d0%b7%d0%b2%d0%b8.pdf?sequence=2&isAllowed=y"
http://ephsheir.uhsp.edu.ua/bitstream/handle/8989898989/2850/Макет Народни назви.pdf?sequence=2&isAllowed=y
Answered By: Roman Riabenko

AIX/Solaris

This recently came up again and I wanted a non-pythonic version that’d work on AIX/Solaris etc.

INPUTSTRING="test%20%21%22%23%24%25%3f%2f%2e%5ctest"
for C in `echo "${INPUTSTRING}" | sed 's/%(..)/ %1 /g'`
do
  case "$C" in
    %*)
      echo $C | sed 's/%//' | (echo 16i; tr '[:lower:]' '[:upper:]'; echo P) | dc
      ;; 
    *)
      printf "%s" "$C"
      ;;
  esac
done

In essence, tokenizes the string and then for each token,
if it’s not a % sequence, just print it. 
Otherwise trim the % and run it through dc with 16i radix.

Relies on dc, sed and POSIX features of printf (no x encoding).

Here it is as a "1-liner":

for C in `echo "test%20%21%22%23%24%25%3f%2f%2e%5ctest" | sed 's/%(..)/ %1 /g'`; do case "$C" in %*) echo $C | sed 's/%//' | (echo 16i; tr '[:lower:]' '[:upper:]'; echo P) | dc ;; *) printf "%s" "$C" ;; esac; done
Answered By: twistedroutes

This solution doesn’t use sed but zsh (specifically about .ohmyzsh plugin) you can use the function omz_urldecode to parse any %## to a readable string:

omz_urldecode 'http://example.com/some%23'
#Output:
http://example.com/some#

You can test your examples %## with a for:

for ((i=21; i<=29; i++)); do 
  omz_urldecode "http://example.com/exa_%${i}mple"
done

Output:

http://example.com/exa_!mple
http://example.com/exa_"mple
http://example.com/exa_#mple
http://example.com/exa_$mple
http://example.com/exa_%mple
http://example.com/exa_&mple
http://example.com/exa_'mple
http://example.com/exa_(mple
http://example.com/exa_)mple

If you have ohmyzsh installed in your system you can use this command in zsh to figure out where is defined that function:

type -a omz_urldecode
omz_urldecode is a shell function from /home/user/.oh-my-zsh/lib/functions.zsh

And to see what that function has:

type -f omz_urldecode
#Output
omz_urldecode () {
        emulate -L zsh
        local encoded_url=$1 
        local caller_encoding=$langinfo[CODESET] 
        local LC_ALL=C 
        export LC_ALL
        local tmp=${encoded_url:gs/+/ /} 
        tmp=${tmp:gs/\/\\/} 
        tmp=${tmp:gs/%/\x/} 
        local decoded="$(printf -- "$tmp")" 
        local -a safe_encodings
        safe_encodings=(UTF-8 utf8 US-ASCII) 
        if [[ -z ${safe_encodings[(r)$caller_encoding]} ]]
        then
                decoded=$(echo -E "$decoded" | iconv -f UTF-8 -t $caller_encoding) 
                if [[ $? != 0 ]]
                then
                        echo "Error converting string from UTF-8 to $caller_encoding" >&2
                        return 1
                fi
        fi
        echo -E "$decoded"
}
Answered By: Edgar Magallon

Must have changed my approach from my earlier comment, probably for a quick tool install instead of coding and/or any manual setup.

Now I use this:

npm i -g url-cli
xout | url -dp | xio; # Linux + Aliases
gc; gc | tr -d 'n' | url -dp | pc; gc; # Windows-Cygwin + Aliases
Answered By: Pysis

Using Raku (formerly known as Perl_6)

Using Raku’s URI::Encode module, which purports to be RFC3986 compliant (just like Perl5’s URI::Encode and/or URI::Escape modules):

~$ raku -MURI::Encode -ne 'put uri_decode($_);'  file

Sample Input:

http://www.example.com/?name=john%20doe&age=54

Sample Output:

http://www.example.com/?name=john doe&age=54

Note, if you’re looking for a more full-blown URL parser, try Raku’s URL module. Sample Output (below) with same input as above:

~$ raku -MURL -ne 'my $url = URL.new($_); .raku.put for $url;'  file
URL.new(scheme => "http", username => Str, password => Str, hostname => "www.example.com", port => Int, path => [], query => {:age("54"), :name("john%20doe")}, fragment => Str)

With the second approach, you can extract only the elements you really need decoded, like the URL query subcomponent, and decode as key/value pairs:

~$ raku -MURL -MURI::Encode -ne 'my $url = URL.new($_); for $url.query.kv -> $k,$v {say $k => uri_decode($v)};'  file
age => 54
name => john doe

https://github.com/raku-community-modules/URI-Encode
https://raku.land/cpan:TYIL/URL
https://raku.org

Answered By: jubilatious1