How can I guess the filetype whithout depending on the file extension, for instance on uploaded files?
Most file formats use a signature also called file magic or magic number which is simply a specific sequence of bytes. For instance PDF files always start with %PDF- plus version number. All you have to do is to compare the first bytes of the file with the signature.
The following script guesses a series of files (you can extend the list of signatures as you wish).
from collections import namedtuple
FileType = namedtuple('DataType', ['type', 'subtype', 'name'])
FT_UNKNOWN = 0
FT_SQLITE3 = 1
FT_BZIP2 = 2
FT_ZIP = 3
FT_RAR = 4
FT_7Z = 5
FT_GZIP = 6
FT_RATECACHE = 7
FT_TAR = 8
FT_SCRIPT = 9
FT_PS = 10
FT_PDF = 11
FT_XML = 12
signatures = {b"\x53\x51\x4c\x69\x74\x65\x20\x66\x6f\x72\x6d\x61\x74\x20\x33\x00": FileType(FT_SQLITE3, None, "sqlite3"),
b"\x42\x5A\x68": FileType(FT_BZIP2, None, "bzip2"),
b"\x50\x4B\x03\x04": FileType(FT_ZIP, None, "zip"),
b"\x50\x4B\x05\x06": FileType(FT_ZIP, "Empty", "zip (empty)"),
b"\x50\x4B\x07\x08": FileType(FT_ZIP, "MultiVolume", "zip (multi-volume)"),
b"\x52\x61\x72\x21\x1A\x07\x00": FileType(FT_RAR, "Version 1.5+", "rar (1.5 +)"),
b"\x52\x61\x72\x21\x1A\x07\x01\x00": FileType(FT_RAR, "Version 5.0+", "rar (5.0 +)"),
b"\x37\x7A\xBC\xAF\x27\x1C": FileType(FT_7Z, None, "7z"),
b"\x1F\x8B": FileType(FT_GZIP, None, "gzip"),
b"LOSRATES": FileType(FT_RATECACHE, None, "ratecache binary"),
b"\x75\x73\x74\x61\72": FileType(FT_TAR, None, "tar"),
b"\x23\x21": FileType(FT_SCRIPT, None, "script"),
b"%!PS": FileType(FT_PS, None, "PostScript"),
b"%PDF-": FileType(FT_PDF, None, "Pdf"),
b"<?xml": FileType(FT_XML, None, "XML")
}
def guess_type(s):
"""Guess the file type based on the magic bytes"""
for magicbytes, filetype in signatures.items():
if magicbytes == s[:len(magicbytes)]:
return filetype
return FileType(FT_UNKNOWN, None, "unknown")
How can I load test my web server application?
If you are familiar with python you should try locust. Instead of firing up an overbloated Java application you basically design your load test as a python script. You will get through the tutorial and api specs much quicker than going through the manual of most other tools. What you do is you basically "programm users" that interacts with your web site.
I would like to show a progress bar in my command line application.
In order to show a progressbar you will need to know from the beginning or at an early stage the amount of data you will process. If you iterate over a list this is trivial. The downside is that you usuallly want to use a progress bar when you need to process a lot of data and this is exactly when you want to use a generator function instead of a list for iteration. Anyway, let's see, how you can implement a progress bar class first:
import sys
class Progressbar(object):
def __init__(self, maxval, width=50, show_percent=True, barchar='»'):
self.maxval = maxval
self.width = width
self._value = 0
self.barchar = barchar
self.show_percent = show_percent
self.paint()
def __del__(self):
print("\n")
@property
def value(self):
return self._value
@value.setter
def value(self,value):
self._value = value
self.paint()
@property
def percent(self):
return '{0:5.2f}'.format(self._value/self.maxval*100)
def paint(self):
try:
progress = int(self._value/self.maxval*self.width)
except ZeroDivisionError:
pass
else:
bar = self.barchar * progress
filler = ' ' * (self.width - progress)
if self.show_percent is True:
output = '\r[{0}{1}] {2}% '.format(bar,filler,self.percent)
else:
output = '\r[{0}{1}] '.format(bar,filler)
sys.stdout.write(output)
def inc(self):
self._value += 1
self.paint()
If you do not know the number of items you are going to process you should still give the user some visual feedback:
class BusyWheel(object):
def __init__(self, msg = 'items processed'):
self.step = 0
self.count = 0
self.spokes = ['-','\\','|','/']
self.msg = msg
def __del__(self):
print('\n')
def inc(self):
self.count += 1
if self.step == 3:
self.step = 0
else:
self.step += 1
def paint(self):
sys.stdout.write('\r{0} {1:5d} {2}'.format(
self.spokes[self.step], self.count, self.msg))
self.inc()
Is there a simple http server I can use for development?
If you have the python interpreter installed you can start a simple http file server with python -m http.server
. Use python -m http.server --help
to see options such as binding to a different address or port.
What does the `if __name__ == '__main__':` do ?
This checks if the module has been invoked as a script, so the code within the if clause will only be executed if the script was called on the command line.
In case the module was imported the code in the if clause will not be executed. This little trick makes it possible to use a module both as a standalone script and an importable module.
¿How do I process a bunch of xml files in a folder?
Code sample:
import glob
import logging
import xml.etree.ElementTree as ET
from pathlib import Path
path = Path('/path/to/xml/files')
def process_files(path):
ns = {"edf":"http://www.vilauma.de/edf/Hotel",
"atmt": "http://www.vilauma.de/edf/Hotel/Allotment"}
for p in glob.glob(str(path / 'hotels' / 'hotelonly' / '*.xml')):
try:
tree = ET.parse(p)
except ET.ParseError:
logging.error("An error has occured while parsing %s", p)
else:
root = tree.getroot()
basic_data_node = root.find("edf:BasicData", ns)
try:
print(basic_data_node.get('Code'))
except AttributeError:
logging.error("Missing 'Code' attribute in %s", p)
if __name__ == '__main__':
process_files(path)
¿How do you alternate True and False?
Code sample:
class Alternator(object):
def __init__(self, startswith=True):
self._value = int(not startswith)
@property
def value(self):
self._value = 1 - self._value
return bool(self._value)
if __name__ == '__main__':
a = Alternator()
for i in range(0,10):
print(a.value)