Markus Barth

How can I guess the filetype whithout depending on the file extension, for instance on uploaded files?

Most file formats use a signature also called file magic or magic number which is simply a specific sequence of bytes. For instance PDF files always start with %PDF- plus version number. All you have to do is to compare the first bytes of the file with the signature.

The following script guesses a series of files (you can extend the list of signatures as you wish).

from collections import namedtuple

FileType = namedtuple('DataType', ['type', 'subtype', 'name'])

FT_UNKNOWN = 0
FT_SQLITE3 = 1
FT_BZIP2 = 2
FT_ZIP = 3
FT_RAR = 4
FT_7Z = 5
FT_GZIP = 6
FT_RATECACHE = 7
FT_TAR = 8
FT_SCRIPT = 9
FT_PS = 10
FT_PDF = 11
FT_XML = 12


signatures = {b"\x53\x51\x4c\x69\x74\x65\x20\x66\x6f\x72\x6d\x61\x74\x20\x33\x00": FileType(FT_SQLITE3, None, "sqlite3"),
             b"\x42\x5A\x68": FileType(FT_BZIP2, None, "bzip2"),
             b"\x50\x4B\x03\x04": FileType(FT_ZIP, None, "zip"),
             b"\x50\x4B\x05\x06": FileType(FT_ZIP, "Empty", "zip (empty)"),
             b"\x50\x4B\x07\x08": FileType(FT_ZIP, "MultiVolume", "zip (multi-volume)"),
             b"\x52\x61\x72\x21\x1A\x07\x00": FileType(FT_RAR, "Version 1.5+", "rar (1.5 +)"),
             b"\x52\x61\x72\x21\x1A\x07\x01\x00": FileType(FT_RAR, "Version 5.0+", "rar (5.0 +)"),
             b"\x37\x7A\xBC\xAF\x27\x1C": FileType(FT_7Z, None, "7z"),
             b"\x1F\x8B": FileType(FT_GZIP, None, "gzip"),
             b"LOSRATES": FileType(FT_RATECACHE, None, "ratecache binary"),
             b"\x75\x73\x74\x61\72": FileType(FT_TAR, None, "tar"),
             b"\x23\x21": FileType(FT_SCRIPT, None, "script"),
             b"%!PS": FileType(FT_PS, None, "PostScript"),
             b"%PDF-": FileType(FT_PDF, None, "Pdf"),
             b"<?xml": FileType(FT_XML, None, "XML")
}

def guess_type(s):
    """Guess the file type based on the magic bytes"""
    for magicbytes, filetype in signatures.items():
        if magicbytes == s[:len(magicbytes)]:
            return filetype
    return FileType(FT_UNKNOWN, None, "unknown")

How can I load test my web server application?

If you are familiar with python you should try locust. Instead of firing up an overbloated Java application you basically design your load test as a python script. You will get through the tutorial and api specs much quicker than going through the manual of most other tools. What you do is you basically "programm users" that interacts with your web site.

I would like to show a progress bar in my command line application.

In order to show a progressbar you will need to know from the beginning or at an early stage the amount of data you will process. If you iterate over a list this is trivial. The downside is that you usuallly want to use a progress bar when you need to process a lot of data and this is exactly when you want to use a generator function instead of a list for iteration. Anyway, let's see, how you can implement a progress bar class first:

import sys

class Progressbar(object):
    def __init__(self, maxval, width=50, show_percent=True, barchar='»'):
        self.maxval = maxval
        self.width = width
        self._value = 0
        self.barchar = barchar
        self.show_percent = show_percent
        self.paint()

    def __del__(self):
        print("\n")

    @property
    def value(self):
        return self._value

    @value.setter
    def value(self,value):
        self._value = value
        self.paint()

    @property
    def percent(self):
        return '{0:5.2f}'.format(self._value/self.maxval*100)

    def paint(self):
        try:
            progress = int(self._value/self.maxval*self.width)
        except ZeroDivisionError:
            pass
        else:
            bar = self.barchar * progress
            filler = ' ' * (self.width - progress)
            if self.show_percent is True:
                output = '\r[{0}{1}] {2}% '.format(bar,filler,self.percent)
            else:
                output = '\r[{0}{1}] '.format(bar,filler)
            sys.stdout.write(output)

    def inc(self):
        self._value += 1
        self.paint()

If you do not know the number of items you are going to process you should still give the user some visual feedback:

class BusyWheel(object):
    def __init__(self, msg = 'items processed'):
        self.step = 0
        self.count = 0
        self.spokes = ['-','\\','|','/']
        self.msg = msg

    def __del__(self):
        print('\n')

    def inc(self):
        self.count += 1
        if self.step == 3:
            self.step = 0
        else:
            self.step += 1

    def paint(self):
        sys.stdout.write('\r{0} {1:5d} {2}'.format(
            self.spokes[self.step], self.count, self.msg))
        self.inc()

Is there a simple http server I can use for development?

If you have the python interpreter installed you can start a simple http file server with python -m http.server. Use python -m http.server --help to see options such as binding to a different address or port.

What does the `if __name__ == '__main__':` do ?

This checks if the module has been invoked as a script, so the code within the if clause will only be executed if the script was called on the command line.

In case the module was imported the code in the if clause will not be executed. This little trick makes it possible to use a module both as a standalone script and an importable module.

¿How do I process a bunch of xml files in a folder?

Code sample:

import glob
import logging
import xml.etree.ElementTree as ET
from pathlib import Path


path = Path('/path/to/xml/files')


def process_files(path):
    ns = {"edf":"http://www.vilauma.de/edf/Hotel",
          "atmt": "http://www.vilauma.de/edf/Hotel/Allotment"}
    for p in glob.glob(str(path / 'hotels' / 'hotelonly' / '*.xml')):
        try:
            tree = ET.parse(p)
        except ET.ParseError:
            logging.error("An error has occured while parsing %s", p)
        else:
            root = tree.getroot()
            basic_data_node = root.find("edf:BasicData", ns)
            try:
                print(basic_data_node.get('Code'))
            except AttributeError:
                logging.error("Missing 'Code' attribute in %s", p)


if __name__ == '__main__':
    process_files(path)

¿How do you alternate True and False?

Code sample:

class Alternator(object):
    def __init__(self, startswith=True):
        self._value = int(not startswith)


    @property
    def value(self):
        self._value = 1 - self._value
        return bool(self._value)


if __name__ == '__main__':
    a = Alternator()
    for i in range(0,10):
        print(a.value)

Frequently asked questions

Python