Current version: 0.8 (5/2015)
[Update 03/2016: The newest AMS doesn’t work in LANA mode if you use the default package. A modified package can be downloaded here: AMS 1.5.3 (LANA). Thanks to the archives in Cologne and London for triggering this and helping in figuring it all out. In the process of this there was also a step-by-step deployment guide produced which you can read on this page. ]
[Update 12/2015: Since December 2015 there are updated versions of the single applications available for download: AMS 1.5.3, Metadata Search 2.0.1, LAMUS 1.3.5, ASV 1.3.0, Metadata TranslationService 1.5.1, RRS 1.6.0b. Compare the corresponding news post for the changes in these versions.]
(previous versions: 0.72 in 10/2014, 0.72-preview for the 8/2014 INNET workshop, 0.71 in 5/2014)
The LAT suite covers the standard set of software tools for archiving and utilizing linguistic data that are being developed by the The Language Archive at the MPI for Psycholinguistics. This note describes the LAT deployment package VMWare image that allows interested users to install the LAT suite on their servers. It is also possible to use this image as basis to update existing servers. Contact us for instructions.
Additional components are available, separately, for Lexicon processing, Shibboleth support, federated annotation search, faceted metadata browsing and search, OAI integration, Handle System persistent identifier support and automated archive checks and statistics.
The VM has been tested in VMWare, but may also work in other hypervisors. The standard configuration uses 4 GB RAM, 1 single-core CPU and 8 GB disk, some of which is used for swap. With more CPU cores and RAM, you get better speed under load. Trova can also distribute single queries over multiple cores. The virtual harddisk size is suitable for testing (at least 1 GB left free) but for real archives, you have to mount a second, larger disk at /lat/corpora first. If you have problems booting, try changing your virtual disk controller type.
For a large archive with tens of TB of data, several 100’000 files and lots of annotation text, you should consider at least 16 GB of RAM and a reasonably modern multi core CPU, or at least 4 cores in a virtual server scenario. For example our archive has almost 1’000’000 files, of which more than 200’000 annotation files with a total of almost 200’000’000 annotation strings. Next to Trova search, we also provide a CQL search web service. In busy moments, the whole archive uses up to 8 CPU cores and 32 GB of application memory, or 64 GB if you also count cache and buffer memory. More and faster CPU cores and RAM speed up concurrent content searches, but make little difference for everyday activities.
Regarding archive storage, you want to support both large and many files: Recordings, metadata, annotations… Fast sequential read access will improve user experience more than tuning for, for example, many concurrent writes. Every archive should use redundant disks and good backups – do not forget to also backup pg_dump PostgreSQL backups of at least the core databases (corpusstructure, ams2, lamus). Backups of the (often large) “annex” database for annotation search are less important, as that database is refilled by running the search ingester cronjob. Some archives will also have a Handle System database, maybe even Lexus/Vicos, DWAN, Annis or PID service.
If you have questions, do not hesitate to contact us, the address is “latadmin mpi.nl” – insert @ after latadmin. You can also contact our forum “General Archiving” on this site.
Regularily updated demo server snapshots are available as tarball of the /lat/ directory or as complete VM snapshot, please verify the integrity of your download using the checksums listed below. The downloads also contain a readme.txt file in the /lat directory. It contains information about many relevant aspects of managing a LAT server, as well as some more exotic hints. You can also download a copy of the readme.txt file separately.
Obviously you will need some passwords to access the demo server when you run it. Those are listed in the readme file, so it is important to change the passwords for ssh and web access for production servers! The demo server does not allow ssh access for root: Instead, login as the special admin user, then use sudo. Details are explained in the readme file. All demo server accounts are local – in the default configuration, you can not use already existing accounts that you may have on other servers.
It is recommended to set a root password for access from a local terminal, which can be needed to run filesystem checks and other maintenance tasks. To set a root password, or the password of other Linux users, simply use the “passwd” command as root. To change the web password of AMS users, use the AMS website (reachable via “manage access rights”). You should not need to change database or other passwords, but contact us for assistance if you still want to do it: You will also have to update all config files which mention database passwords.
Software versions included, as of 2015-05-13:
- New HTML-based ASV Metadata Browser 1.2.0, old applet-based IMDI browser 1.4.0 still included, but disabled by default
- New HTML-based Metadata Search 2.0, old applet-based interface still supported but not shown by default. Refresh your search index by crawling your archive in LAMUS (crawler 22.214.171.124) when upgrading!
- AMS2 126.96.36.199 (unchanged) access management system with Resource Request System 1.5.2-37437
- Trova 1.5.39358 (unchanged) annotation content search tool, with ingester cronjob version 1.4.11)
- Imex 1.1.39560 (unchanged) image thumbnail viewer
- Annex 1.6 annotation viewer with Apache H264 streaming module and Spark media player, PHP-free and Perl-free
- LAMUS 1.3, archive upload and management system. Note: LAMUS database is webuser-owned, check your Tomcat config and apply
REASSIGN OWNED BY corpman TO webuser;if you upgrade from corpman-owned variants!
- RRS 1.5.2 resource request system for submitting access grant requests to corpus managers
- SuSE Linux Enterprise Server 11 Service Pack 3 (x86_64) with VMWare client driver support
- Apache 2.4.12 web server with OpenSSL 1.0.1m, using apr 1.5.2 and apr-util 1.5.4
- PostgreSQL 9.3.6 database server
- Java 1.7.0_79 (last Java 7 version, Java 8 not yet supported)
- Tomcat 6.0.43 with mod_jk Tomcat connector 1.2.40
- Some Java browser applet security warnings resolved in LAMUS, other components replaced by versions which work without Java browser applets. This also improves accessibility from mobile devices.
- Flash browser plugin required for Annex and Trova (Flash is built-in for some browsers)
- HTTPS supported by default, using TLS. Self-signed sample certificate included for testing, please acquire professional certificates for production archives!
- Configuration, cert.py and init scripts updated
- Compression script for htaccess, htaccess security bug fixed already in 0.72
- Readme file (/lat/readme.txt) extended to cover an even wider range of admin topics
- Better documentation about upgrades and config changes, such as hostname changes
- Script included to “rebase” the server to another hostname, domain, etc.
- Now with example database backup cronjob. Do not forget to copy the backup files made by the cronjob to a separate server on a regular basis!
The tarball simply contains the /lat directory of the VM image. This can be used to update an existing install. You can also use it to install the software on any Linux server, but in that case, you will first have to create the necessary Unix users and groups and may have to install some dependencies first. Contact us for more information.
No matter if you use the tarball or the full VM image, it is always important that you install updates provided by the operating system on a regular basis! For the SLES based VM image, this means that you should register a free account at SuSE / Novell for updates and notifications.
The VM itself is, since 2014-08-20, based on SLES 11.3 (x86_64), but we might switch to Ubuntu or a similar Linux distro, as that already includes sufficiently modern Apache, PostgreSQL, Java, Tomcat and OpenSSL to use out of the box. Then we would no longer have to provide custom compiled LAT versions. Please share your thoughts on this. We no longer include a full OpenLDAP installation, only a tarball of a LAT-configured older version is included for reference.
For testing the system, run it on any location, edit your /etc/hosts file or equivalent on your client to link the IP of your system to www.demoserver.fake and make sure that your web browser does not try to use a proxy to access that server. Alternatively, login on the system and run the “rebase” script to make it use the actual hostname of where you have installed it. Note that this only works from www.demoserver.fake to another name and not for repeated changes.
The recommended upgrade path is to update the OS yourself and download only the tarball, which contains only the /lat/ directory. You should have /lat/ and possibly also /lat/corpora/ on separate partitions to keep them separate from the raw operating system in the / partition. This can make OS upgrades easier.
File properties for version 0.8, 2015-05-13:
- tla-demoserver-0.8-lat-directory-2015-05-13.tar.bz2 is 644 143 588 bytes, sha1 checksum 33127e5ea242004c2e8254e8f6883a821d75f8ef
- tla-lat-demoserver-0.8.7z is 1 166 735 125 bytes, sha1 checksum 6d0242c1e01c59d8636c167d7d8180b4b8e3fd0c
You need the free 7zip tool to unpack the VM, see the 7zip website – a zip would have been even larger and some users have limited bandwidth. If you have troubles booting the VM, changing your VirtualBox or VMware configuration from virtual SCSI to virtual SATA disks should help. The older innet-workshop-2014-08-25-lat-there-be-light-tla-server-upgrades-6up-handout gives a quick overview of the install/upgrade process to version 0.7x and might still be of interest even for version 0.8 and newer.
EXTRA INFO regarding the installation and upgrade process:
- When installing only the ASV metadata browser on your old server manually, you have to add a resource AMS2-CMDI-DB which has to be a copy of the normal AMS2 database resource to your config: This is not actually related to CMDI, just a badly chosen name for the database connection which is shared between CMDI and IMDI mode of the metadata browser.
- You may also have to GRANT SELECT ON versions TO “imdiArchive” in corpusstructure database using psql when upgrading some software from some older versions.
- Bug in version 0.72: Set authenticationService to amsAuthenticationSrv (not to integratedAuthenticationSrv) in webapps/*/current/WEB-INF/web.xml to avoid long waits for non-existing LDAP.
- Known bug in 0.8: In the new advanced metadata search, the advanced option to search ONLY in descriptions does not yet work. You can use the simple search tab to search in all fields instead.
- nl.mpi.smtpHost and /ams2/config/mailhost need editing in /lat/conf/* depending on your infrastructure
- You may have to copy corpus license texts from your old installation
- If your /lat/corpora is a separate mount, umount it before moving /lat itself! Remount only after the demo archive step, possibly renaming the demo /lat/corpora/ first and making a new mountpoint
- When copying the database backup dir, copy only contents, not dir, if dir already exists
- People with 32-bit OS or SLES older than 11 have to install SLES 11.3 64-bit before upgrading but AFTER making database backups and of course WITHOUT damaging or otherwise changing /lat or /lat/corpora during the operating system upgrade
- When upgrading from AMS older than 1.4.9, database transformation scripts have to be applied as part of the upgrade.
- When upgrading from really old installations, the corpusstructure accesslevel and pid columns may have to be added. Contact us for assistance.
- Take care about different /lat/corpora scenarios when moving!
- For upgrades from old systems, do not forget to also upgrade the ingester cronjob line in /etc/crontab
- SLES 11.3 ISO files, for upgrade and fresh install, are available from the SuSE website after a free registration – only support and automatic updates are non-free.
- It seems people have been using LAT software with OpenSuSE and other distros with minimal changes, reports and howtos would be welcome.
- The crawl function in LAMUS has remote IMDI access disabled – use a script in /lat/tools/crawler/ if you want to crawl remote corpora: Start from a node which is still on your own server and add the options -checkHTTP true -followHTTP true -readAllIMDI true to the options in your script (e.g. between -bootstrap false and the -searchServletURL … option) …