<<>>open webamp
 
justin = {main feed = { recent comments, search } , music , code , askjf , pubkey };
 
Searching for 'slo' in (articles)... [back to index]

Well I didn't update for a while. Lost a ton of music related stuff (instruments, etc) due to a fire, that sucked. Will write more about that at some point, maybe, though also maybe will try to forget heh.

What I'm really posting here is to claim a momentary victory (it won't last long no doubt) against the hoards of scrapers effectively DDoSing web sites out there. Our forums would have 100,000+ "guest" users "browsing," causing all kinds of performance issues. Last night and today, added some stuff requiring some computation via javascript and a little bit of user interaction to authenticate the guest user as not-a-bot, and now we have a few hundred. What a mess the internet is. :/ Sigh.

Update Jan 2:
There are (currently) four layers of scraper/bot/ddos mitigations we are using:
anyway that's the current state of trying to run a forum with 280,000 threads and a few million posts these days. I'm sure it won't last, they will start defeating our countermeasures and we'll have to escalate them. I really do hate requiring JavaScript, some people don't want to do that. sigh.

3 Comments
TIL the default Apache Timeout is set to 300s, which is a long ass time, especially when dealing with crawlers that sometimes leave their connections open after their request. And they'll fill up the default 256 connection slots pretty fast, making things slow as shit. 10 or 15 is probably plenty.

Comment...
2024 Retrospective
January 18, 2025
My 2024 stats (2023 was here): The slowdown of getting old continues, I guess?

1 Comment
programming is fun
November 2, 2024
Steve entered the Western States 100 lottery after qualifying in January... They are limited to 369 runners each year, and it's a legendary race, so there's a lot of demand. They have an interesting lottery structure: every year that you qualify and enter the lottery, if you are not picked, you get twice the entry tickets the next year.

After some early morning text messages where I tried (and to be honest, failed miserably) to calculate his odds, I wrote a little program in C to do the calculation.

Then, because sometimes programming is fun, I decided to see what it would look like in Perl. I haven't written a ton of Perl, I usually lean towards PHP for this sort of thing, but I ported SWELL's resource-processor script to Perl a while back and enjoyed it, and having a simple task is good for learning.

The first pass I did used a builtin array shuffle module, which proved too slow, then I ended up simplifying it and got it significantly faster than the C version (which didn't reshuffle, but did end up with pointless memmove()s). Once I had it all working (using strict, e.g. you have to declare everything), I decided to see how small I could get it to be. Here's what I ended up with (you can pipe the table on their lottery entry page to it):
$nwin = 250;    # I think the lottery picked about 250-something winners last year, the rest were golden tickets and such
$nlot = 100000; # someone could quantify the margin of error based on this, with statistics you would oh look a bird

for ($pid = 0; <>; ) {
  ($nt, $np) = split(/\s+/, s/,//r);
  ($nt > 0 and $np > 0 and not exists $wcnt{$nt}) or die "invalid input: $_\n";
  $wbin{$pid} = $nt;
  $wcnt{$nt} = 0;
  push(@tk, ($pid++) x $nt) while ($np-- > 0);
}

printf("%d tickets for %d entrants, running %d lotteries for %d winners:\n", $tkcnt = @tk, $pid, $nlot, $nwin);

for ($x = 0; $x < $nlot; $x++) {
  %in = { };
  $in{$id = $tk[rand($tkcnt)]}++ == 0 and exists $wbin{$id} and $wcnt{$wbin{$id}}++ while (%in < $nwin);
}

printf("%d tickets: %.2f%% win\n", $_, $wcnt{$_} * 100.0 / $nlot) foreach (sort { $a <=> $b } keys %wcnt);
Here is the most recent input:
512	 	1	 	512
256	 	15	 	3,840
128	 	55	 	7,040
64	 	139	 	8,896
32	 	215	 	6,880
16	 	296	 	4,736
8	 	594	 	4,752
4	 	963	 	3,852
2	 	1,538	 	3,076
1	 	2,077	 	2,077
and here is the most output with that table:
45661 tickets for 5893 entrants, running 100000 lotteries for 250 winners:
1 tickets: 0.66% win
2 tickets: 1.29% win
4 tickets: 2.56% win
8 tickets: 5.08% win
16 tickets: 9.99% win
32 tickets: 18.98% win
64 tickets: 34.12% win
128 tickets: 56.51% win
256 tickets: 80.91% win
512 tickets: 96.24% win
So, Steve's odds as of this afternoon are about 0.66%, but that will almost certainly go down (there's still a month left of the lottery; it only opened yesterday). Interestingly, one entrant there has been turned down 8 times before -- they currently have a 96% chance of getting in. And those who have been turnwed down 6 times before are slightly more likely than not to get in.

2 Comments
at this age
September 26, 2024
there are very few things more satisfying than cutting a slot in the previously stripped head of a screw using a dremel and then using a flathead screwdriver to remove it

3 Comments
When I first started programming C++ in the 1990s, there were language features that I found appalling. One of those was operator overloading, which was used in the most basic of C++ example code, e.g.
  cout << "hello, world" << endl;

This made the C programmer in me squirm. Why would you make the meaning of operators change wildly based on the context? Also you might lose track of what code is actually being generated. It could have side effects that you don't know about, or be orders of magnitude slower! I still fill this way, and avoid operator overloading, other than operator=, for most things.

...but having said that, I find operator overloading to be incredibly valuable when it comes to maintaining and refactoring a large code base. A pattern that has become routine for us is a basic type is initially used for some state and needs to be extended.

For example: in track groups, we originally had 32 groups (1-32), and a track could have different types of membership in any number of groups. For 32 groups, we used unsigned ints as bitmasks. Some years later, to support 64 groups we changed it to WDL_UINT64 (aka uint64_t). Straightforward (but actually quite tedious and in hindsight we should've skipped the type change and gone right to the next step). To increase beyond 64 bits, there's no longer a basic type that can be used. So instead:

  struct group_membership {
    enum { SZ = 2 };
    WDL_UINT64 m_data[SZ];

    const group_membership operator & (const group_membership &o) const
    {
      group_membership r;
      for (int i = 0; i < SZ; i ++) r.m_data[i] = m_data[i] & o.m_data[i];
      return r;
    }
    // and a bunch of other operators, a couple of functions to clear/set common masks, etc

  private:
     // prevent direct cast to int/int64/etc. necessary because we allow casting to bool below which would otherwise be quietly promoted to int
    operator int() const { return 0; }
  public:
    operator bool() const { for (int i = 0; i < SZ; i ++) if (m_data[i]) return true; return false; }
  };

Then we replace things like:

  WDL_UINT64 group;
with
  group_membership group;

and after that, (assuming you get all of the necessary operators mentioned in the comment above) most code just works without modification, e.g.:

  if (group & group_mask) { /* ... */ }

To be fair, we could easy tweak all of the code that the operator overloading implements to use functions, and not do this, but for me knowing that I'm not screwing up the logic in some subtle way is a big win. And if you have multiple branches and you're worried about merge conflicts, this avoids a lot of them.

Also, it's fun to look at the output of gcc or clang on these things. They end up producing pretty much optimal code, even when returning and copying structs. Though you should be sure to keep the struct in question as plain-old-data (ideally no constructor, or a trivial constructor, and no destructor).

Thus concludes my "a thing I appreciate about C++" talk. Next week, templates.

Recordings:

super8_novideo - 1 -- [9:36]
super8_novideo - 2 -- [4:16]
super8_novideo - 3 -- [4:34]
super8_novideo - 4 -- [8:20]

12 Comments

My last trail race was a bit over a year ago and apparently these experiences are some of the few I find worth documenting, so I'll bring the imaginary reader up to date on the the last year of running/hiking related activities/health/etc, even though my 2023-in-review post did some light quantification.

After Ray Miller I kept running and ran some road races in NY, including the Brooklyn Half, and a 5k a week later (when I wasn't fully recovered, which was also fun but probably too hard to soon). Later on in May I started having some leg nerve feelings, which ended up being caused by a herniated disc, so I had to cool it on the running for a bit. I started walking a lot. Pretty much any time I would normally bicycle somewhere, I'd walk instead. And as advised, I got a routine going of core strength exercises, and figured out how to continue my glute/hamstring PT. The gym, ugh. I think I read somewhere that gymnasiums were originally places where people would work out in the nude. Maybe the root was the greek word for nudity? anyway I digress. I find myself doing this in text messages too, saying way too much. Do I do it in person too and not notice it because there's no written record of it?

In the summer, Steve, Edward and I all signed up for the January 27, 2024 Sean O'Brien 50 miler. Edward and I ran this race in 2020, right before the pandemic, and joked about how covid would be nothing. When I signed up for the 2024 race, I wasn't running, did not know if I would be running by January, but I figured worst case I could try to power hike it.

I walked the NY Marathon in November (in 5:20 or so), which was a fantastic experience and I would recommend it to anybody who likes to walk. I took videos of a lot of the bands who played and then had a good Strava post which read as a tripadvisor review of the New York Music Festival -- too much walking! I should've posted those videos here. Maybe I still will. Let me know in the comments if you think that's a good idea.

A couple of weeks after the NY Marathon, I started running again, and worked up (with a covid-intermission) to a few 15-20 mile weeks, on top of 40-60 miles of walking. When I was running at my peak, the most miles per week I'd ever sustain was about 40, so I was feeling pretty good about the volume and time on my feet. Then, the week before the race, the SOB organizers sent out an email that mentioned you could change distances, and also that if you were running the 100k and you missed the 17 hour cutoff (or decided you wanted to bail), you could drop to the 50 miler during the race, at mile 43. So it became a fathomable thing -- sign up for the 100k, and if you're not feeling it at 43, just do 50. And not only that, if things go really poorly, it buys you another half hour for the 50 miler. Steve and I decided to do that.


(an old friend saw me off on my journey)

We drove to LA.


(this dog barked at me until I acknowledged him at the red light)


The (almost)-annual pilgrimage to Tacos Delta. Saw Paranorm. ChatGPT told us (and Wikipedia eventually confirmed) that Tarzana was named after the creation of its resident Edgar Rice Burroughs. Steve walked 15 miles the day before the race (!).


drop bags

Gear for the race:

The race -- forecast was a low of 55 and a high of 72. Turns out the start was considerably colder, though, due to microclimates of the valley. But it was still quite pleasant having so recently been in the 20-degree highs of NY.


Psyched sideways

The first half of the race was more of a run than a race, as these things go.


The race begins on a road. Hey wait.

The water crossing at mile 2 was quite a bit higher and unavoidable this year. In 2020 I managed to keep my feet dry. The wool socks I was wearing quickly dried out and didn't give me any problems.









I changed my shirt and hat and dropped off my headlamp at the drop bag at mile 13, around that time I noticed some bits of chafing in spots, put some bodyglide on there and it stopped being a problem.

Peanut butter pretzels were good, stroopwaffels too. I think I might have accidentally dropped a wrapper though, ugh, sorry, hopefully someone picked it up. I put it in my pocket and closed the zipper but when I went to open the pocket at the aid station to dump the trash it was gone. Or maybe I misplaced it. Your brain sometimes loses track of all of these things.


why didn't someone tell me that water bottle looks ridiculous in the pocket?


group of mountain bikers having lunch, I assume. nice spot for it.


this time, on the descent to Bonsall, I could see the trail across the valley that we would later be on. makes me giddy!


settling in to the race and getting ready to fall apart at mile 22, Bonsall


At mile 22 I stopped, saw a guy (hi, Peter) whom I had previously mistaken for Steve, put some squirrel nut butter and a bandaid on a hotspot on my big toe (worked well, never used that stuff before). Filled up to the brim with water.



(crows getting a free ride on the ridge)

I paid more attention to birds this year, and not just the crows. I'd like to go back to these trails with a camera and big lens and time to spare.

The massive climb out of Bonsall was nice since I knew what to expect (in 2020 it was demoralizing, lol), but it was really hot. There was a stream crossing where dipping your hat in the water was an amazing feeling (though within minutes it was back to heat). If I had more time I would've sat in it.

The second half of the race was more difficult. I no longer had the energy to take pictures. The aid station around the halfway point had a lot of bacon. I really wanted some but I couldn't bring myself to eat any. This seems to happen to me at this point, nausea and stuff. I need to figure this out (brain thinks I have food poisoning or something?). Maybe I should've tried a gel. Doesn't require chewing and pure sugar, might have been worth the try. Hindsight.

At mile 37-ish, drop bag again, grabbed lights, long sleeved shirt, other hat. Didn't want to mess with my socks so kept my original pair.

I kept moving, snacking a little bit here and there, trying to down some tailwind along with the water, hanging on. By mile 43 (nearly 11 hours after the 5:30am start) I was 5 minutes ahead of my 2020 time, and only 10 minutes behind Steve, but I really couldn't eat anything. I overhead a couple of people drop to the 50 miler. My legs felt OK, and it turned out if I continued on with the 100k route, I could always drop at 50 miles (since it was a 6-mile each way out-and-back that turned around near the finish). So I continued on. Up a hill, then down a really massive hill. Going up the hill was fine. Going down the hill was difficult. I haven't done enough quad strength training. Tsk tsk. I ran a little bit of it but it was mostly walking. Ate maybe 3 almonds, drank a few swigs of tailwind. It was starting to get dark. At the bottom of the hill it was along a riverbed for a while. Lots of frog sounds. I saw Steve when I was about 15 minutes away from the 50 mile aid station (so his lead was up to about 30-45 minutes at that point, I guess?).

The aid station people gave me a quarter of a banana, which I ate. It was not easy. They were nice (they are all). Someone I talked to earlier in the race asked if I had a pacer for this part, then looked at me like I was crazy for not. I remembered this, and asked if there were any freelance pacers available. No such luck.

Did the riverbed commute back to the climb, now with my head(waist)lamp on. Coming down the hill was a race marshall, sweeping the course. Nobody else would be coming down the hill. I could see headlamps far ahead, and occasionally see them far behind me, but for a long time I saw nobody, and heard only the sounds of frogs and wind. The moon rose, it had a lot of clouds in front of it and looked very red on the horizon.

I running a huge calorie deficit and was having to regulate my climbing of the hill in order to prevent bonking. I'd go too hard and have to back off because I could feel it wouldn't be sustainable. This was the toughest part of the experience, I think, this climb. When I was eventually caught by another runner, it was nice.

Going over the hill and back down to the mile 43 aid station (again, though now at 55-ish), with 7 miles to go. This aid station is a bit remote and you can't drop out of the race there, and I guess it was getting late, so the aid station captain was really big on getting me moving. Tried to get me to eat, but when I did my best picky eater impression he said (very nicely -- everybody volunteering at the aid stations were amazing) something to the effect of "well water is what you need most right now, now get moving." So I did. I ended up not having any real calories to speak of for the last 20 miles of the race, heh. Though almost all of those 20 miles were walked, not run.

After that final aid station, the last 7 miles were challenging but also pretty straightforward, the finish was within reach, and I had plenty of time to not hit the cutoff at a walking pace. My underwear had bunched up and I had some pretty significant butt chafing but it was too late to do anything about it, just had to suffer with it. Should've checked in for it hours ago, doh. Once I got to the flat ground of the last mile, walking at about 13 minutes/mile to the finish felt pretty good (flat!). I was sort of hoping to be DFL, but not enough to slow down when I saw some headlamps behind me.


After more than 16 hours of running and hiking, Steve was waiting for me at the finish (having waiting 90 minutes! <3). There was food, but it would be hours until I could eat anything meaningful. We headed back to Tarzana, and watched some TV (was it Boys or 30 Rock? I can't remember) before crashing.

I got the shivers again. Seemed to be associated with movement, too. Didn't last too long, and not so bad. Way better than covid. Apparently it's about inflammation.


The next day Edward made us steak. Amazing.


There was ice cream, and a cold swim in a 55F pool. Total win.

Am I ready to do this race (including its 13,000ft of climbing and descent) again? No. But it won't be long.

4 Comments

2023 Retrospective
January 4, 2024
I could talk about things that were important in the world last year but I don't think I have anything terribly constructive to add, so instead I will post this:

At the end of 2021 I calculated some stats, but apparently I forgot to do anything for 2022. Here's 2023: I guess the overall trend is that I'm slowing down! Something to think about...

Recordings:

sandwich terrier

Comment...
It's now time when I bitch about, and document my experiences dealing with Apple's documentation/APIs/etc. For some reason I never feel the need to do this on other platforms, maybe it's that they tend to have better documentation or less fuss to deal with, I'm not sure why, but anyway if you search for "macOS" on this blog you'll find previous installments. Anyway, let's begin.

A bit over a year ago Apple started making computers with their own CPUs, the M1. These have 8 or more cores, but have a mix of slower and faster cores, the slower cores having lower power consumption (whether or not they are more efficient per unit of work done is unclear, it wouldn't surprise me if their efficiency was similar under full load, but anyway now I'm just guessing).

The implications for realtime audio of these asymmetric cores is pretty complex and can produce all sorts of weird behavior. The biggest issue seems to be when your code ends up running on the efficiency cores even though you need the results ASAP, causing underruns. Counterintuitively, it seems that under very light load, things work well, and under very heavy load, things work well, but for medium loads, there is failure. Also counterintuitively, the newer M1 Pro and M1 Max CPUs, with more performance cores (6-8) and fewer efficiency cores (2), seem to have a larger "medium load" range where things don't work well.

The short summary: Perhaps this was all obvious and documented and I failed to read the right things, but anyway I'm just putting this here in case somebody like me would find it useful.

5 Comments
Ah, Big Sur
March 22, 2021
This most recent (as of this blog post) macOS update (and new architecture) has raised a number of issues with REAPER development -- I'm documenting them here in hopes of it being useful to someone:



There were other things that came up that aren't as interesting. When launching Audio MIDI Setup.app, the new path is /System/Applications/Utilities/Audio MIDI Setup.app rather than /Applications, for example. Apple likes to change the appearance of NSTableViews and then provide ways to get them partially back to their previous style, but not all the way. Stuff like that... Dealing with macOS updates and compatibility is really not enjoyable. So tiring. Though the x87 vs SSE thing isn't their fault -- no x86_64 system has ever not had SSE...

4 Comments
radiohead puzzle slowness
September 28, 2020



Recordings:

Not Vampires - 1 -- [8:08]
Not Vampires - 2 -- [5:25]
Yes, Exactly, Yes! - 1 -- [6:08]
Yes, Exactly, Yes! - 2 -- [7:44]
Yes, Exactly, Yes! - 3 -- [11:40]
Yes, Exactly, Yes! - 4 -- [4:27]
Yes, Exactly, Yes! - 5 -- [12:59]
Yes, Exactly, Yes! - 6 -- [4:44]
Yes, Exactly, Yes! - 7 -- [5:52]
Yes, Exactly, Yes! - 8 -- [5:37]
Yes, Exactly, Yes! - 9 -- [9:12]
Yes, Exactly, Yes! - 10 -- [4:50]
Yes, Exactly, Yes! - 11 -- [9:59]

Comment...

postpone 2020
March 24, 2020
Ah so it's been almost a week since I've posted here. Time got really really slow for a bit there but now maybe it is speeding up as life as we know it becomes more routine.

Such a bizarre predicament that nearly (?) all of humanity faces. We're all in it together, unless we're in denial. When the weather has been nice (most days since I last wrote), the parks have been crazily busy, which makes running really difficult. So instead:

(lucky me with roof deck access) An hour is an excruciatingly long time. At one point before I tried an hour I contemplated doing 24 hours. Now I think that's insane. It might happen. Yesterday was cold and rainy and amazing, I could run wherever I wanted and nobody was around! Bring back winter!

I'm now on board with everybody wearing masks (not N95 but procedure masks etc) when they are out in public. It makes a lot of sense, given the fact that people are contagious before and/or without symptoms. The advice here has always been "wear a mask only if you're sick" which is no longer applicable! Anyway I've spent a great deal of time making a mask (and now starting a second) from this site's template/instruction, which seems to work well! I don't have a sewing machine so I'm using a couple of tiny travel sewing kits that I've acquired over the years and doing all of the stitches by hand. I'm terrible at it, but getting less so as time goes on.

Musically, I've turned to NINJAM at home again, Andy and I played the other night which was nice. Hope to do more of that!

(obviously it goes without saying, all the people out there who are helping other people: doctors, nurses, delivery people, people working in grocery stores, thank you thank you thank you. I'll try to say this more in person but it's hard with all of the other pressures of life outside of home)

9 Comments
Five years ago, in the year of our lord 2014, I wrote about the difficulties of drawing bitmapped graphics to screen in macOS, and I revisited that issue again in the winter of 2017.

Now, I bring you what is hopefully the final installment (posted here for usefulness to the internet-at-large).

To recap: drawing bitmapped graphics to screen was relatively fast in macOS 10.6 using the obvious APIs (CoreGraphics/Quartz/etc), and when drawing to non-retina, normal displays in newer macOS versions. Drawing bitmapped graphics to (the now dominating) Retina displays, however, got slow. In the case of a Retina Macbook Pro, it was a little slow. The 5k iMacs display updates are excruciatingly slow when using the classic APIs (due to high pixel count, and expensive conversion to 30-bit colors which these displays support).

The first thing I looked at was the wantsLayer attribute of NSView:
After seeing that enabling layers wasn't going to help the 5k iMacs (the ones that needed help the most!), I looked into Metal, which is supported on 10.11+ (provided you have sufficient GPU, which it turns out not all macs that 10.11 supports do). After a few hours of cutting and pasting example code in different combinations and making a huge mess, I did manage to get it to work. I made a very hackish build of the LICE test app, and had some people (who actually have 5k iMacs) test the performance, to see if would improve things.

It did (substantially), so it was followed by a longer process of polishing the mess of a turd into something usable, which is not interesting, though I should note: This stuff is now in the "metal" branch of swell in WDL, and will go to master when it makes sense. This is what is in the latest +dev REAPER builds, and it will end up in 6.0. I'm waiting for it to completely bite me in the ass (little things do keep coming up, hopefully they will be minor).

As one final note, I'd just like to admonish Apple for not doing a reasonable implementation of all of this inside CoreGraphics. The fact that you can't update the 5k iMac's screen via traditional APIs at an even-halfway-decent rate is really stupid.

P.S. It does seem if you want to have your application support Dark Mode, you can't use [NSView lockFocus] anymore either, so if you wish to draw-out-of-context, you'll have to use Metal...

Recordings:

Decanted Youth - 1 - Supposed to Be -- [8:14]
Decanted Youth - 2 - (Vaguely Instrumental) Legacy -- [16:11]
Decanted Youth - 3 - (Vaguely) Round and Round -- [5:15]
Decanted Youth - 4 - The Squeeze -- [6:31]
Decanted Youth - 5 -- [3:16]
Decanted Youth - 6 - (mini cover medley) -- [10:03]
Decanted Youth - 7 -- [9:15]
Decanted Youth - 8 -- [8:26]
Decanted Youth - 9 - Trees and Mold -- [9:37]
Decanted Youth - 10 -- [4:53]
Decanted Youth - 11 -- [9:05]
Decanted Youth - 12 -- [7:02]

6 Comments
the lowest end
July 13, 2017
To follow up on my last article about Linux on the ASUS T100TA, I recently acquired (for about $150) an ASUS C201 Chromebook, with a quad-core (1.8ghz?) ARM processor, 4GB RAM, and a tiny 16GB SSD. This is the first time I've used a Chromebook, and ChromeOS feels not-so-bad. I wish we could target it directly!

...but we can't! At least, not without going through Javascript/WebAssembly/whatever. Having said that, one can put it in developer mode (which isn't difficult but also is sort of a pain in the ass, especially when it prompts you whenever it boots to switch out of developer mode, which if you do will wipe out all of your data, ugh). In developer mode, you can use Crouton to install Linux distributions in a chroot environment (ChromeOS uses a version of the Linux kernel, but then has its own special userland environment that is no fun).

I installed Ubuntu 16.04 (xenial) on my C201, and it is working fine for the most part! It's really too bad there's no easy way to install Ubuntu completely native, rather than having to run it alongside ChromeOS. ChromeOS has great support for the hardware (including sleeping), whereas when you're in the Ubuntu view, it doesn't seem you can sleep. So you have to remember to switch back to ChromeOS before closing the lid.

So I built REAPER on this thing, fun! And I still have a few GB of disk left, amazingly. Found a few bugs in EEL2/ARM when building with gcc5, fixed those (I'm now aware of __attribute__((naked)), and __clear_cache()).

Some interesting performance comparisons, compiling REAPER: REAPER v5.50rc6 (48khz, 256 spls, stock settings), "BradSucks_MakingMeNervous.rpp" from old REAPER installers -- OGG Vorbis audio at low samplerates, a few FX here and there, not a whole lot else: (The T100TA's ALSA drivers are rough, can't do samplerates other than 48khz, can't do full duplex...)

Overall both of these cheapo laptops are really quite nice, reasonably usable for things, nice screens, outstanding battery life. If only the C201 could run Linux directly without the ugly ChromeOS developer-mode kludge (and if it had a 64GB SSD instead of 16GB...). Also, I do miss the T100TA's charge-from-microUSB (the C201 has a small 12V power supply, but charging via USB is better even if it is slow).

I'll probably use the T100TA more than the C201 -- not because it's slightly faster, but because I feel like I own it, whereas on the C201 I feel like I'm a guest of Google's (as a side note, apparently you can install a fully native Debian, but I haven't gotten there yet.. The fact that you have to use the kernel blob from ChromeOS makes me hesitate more, but one of these days I might give it a shot).

4 Comments
I've been working on a REAPER linux port for a few years, on and off, but more intensely the last month or two. It's actually coming along nicely, and it's mostly lot of fun (except for getting clipboard/drag-drop working, ugh that sucked ;). Reinventing the world can be fun, surprisingly.

I've also been a bit frustrated with Windows (that crazy defender/antispyware exploit comes to mind, but also one of my Win10 laptops used to update when I didn't want it to, and now won't update when I do), so I decided to install linux on my T100TA. This is a nice little tablet/laptop hybrid which I got for $200, weighs something like 2 pounds, has a quad core Atom Bay Trail CPU, 64GB of MMC flash, 2GB of RAM, feels like a toy, and has a really outstanding battery life (8 hours easily, doing compiling and whatnot). It's not especially fast, I will concede. Also, I cracked my screen, which prevents me from using the multitouch, but other than that it still works well.

Anyway, linux isn't officially supported on this device, which boots via EFI, but following this guide worked on the first try, though I had to use the audio instructions from here. I installed Ubuntu 17.04 x86_64.

I did all of the workarounds listed, and everything seemed to be working well (lack of suspend/hibernate is an obvious shortcoming, but it booted pretty fast), until the random filesystem errors started happening. I figured out that the errors were occurring on read, the most obvious way to test would be to run:
debsums -c
which will check the md5sum for the various files installed by various packages. If I did this with the default configuration, I would get random files failing. Interestingly, I could md5sum huge files and get consistent (correct results). Strange. So I decided to dig through the kernel driver source, for the first time in many many years.

Workaround 1: boot with:
sdhci.debug_quirks=96
This disables DMA/ADMA transfers, forcing all transfers to use PIO. This solved the problem completely, but lowered the transfer rates down to about (a very painful) 5MB/sec. This allowed me to (slowly) compile kernels for testing (which, using the stock ubuntu kernel configuration, meant a few hours to compile the kernel and the tons and tons of drivers used by it, ouch. Also I forgot to turn off debug symbols so it was extra slow).

I tried a lot of things, disabling various features, getting little bits of progress, but what finally ended up fixing it was totally simple. I'm not sure if it's the correct fix, but since I've added it I've done hours of testing and haven't had any failures, so I'm hoping it's good enough. Workaround 2 (I was testing with 4.11.0):
--- a/drivers/mmc/host/sdhci.c
+++ b/drivers/mmc/host/sdhci.c
@@ -2665,6 +2665,7 @@ static void sdhci_data_irq(struct sdhci_host *host, u32 intmask)
 				 */
 				host->data_early = 1;
 			} else {
+				mdelay(1); // TODO if (host->quirks2 & SDHCI_QUIRK2_SLEEP_AFTER_DMA)
 				sdhci_finish_data(host);
 			}
 		}
Delaying 1ms after each DMA transfer isn't ideal, but typically these transfers are 64k-256k, so it shouldn't cause too many performance issues (changing it to usleep(500) might be worth trying too, but I've recompiled kernel modules and regenerated initrd and rebooted way way too many times these last few days). I still get reads of over 50MB/sec which is fine for my uses.

To be properly added it would need some logic in sdhci-acpi.c to detect the exact chipset/version -- 80860F14:01, not sure how to more-uniquely identify it -- and a new SDHCI_QUIRK2_SLEEP_AFTER_DMA flag in sdhci.h). I'm not sure this is really worth including in the kernel (or indeed if it is even applicable to other T100TAs out there), but if you're finding your disk corrupting on a Bay Trail SDHCI/MMC device, it might help!

6 Comments
TL;DR: Retina iMac (4k/5k) owners can greatly improve the graphics performance of many applications (including REAPER) by setting the color profile (in System Preferences, Displays, Color tab) to "Generic RGB" or "Adobe RGB." (and restarting REAPER and/or other applications being tested)

I previously wrote in mid-2014 about the state of blitting bitmaps to screen on modern OS X (now macOS) versions. Since then, Apple has released new hardware (including Retina iMacs) and a couple of new macOS versions.

Much of that article is still useful today, but I made a mistake in the second update: While this was helpful (and did decrease the amount of time spent blitting), it was wrong in that the reason for the faster blit was that the system was parallelizing the blit with multiple cores. So, it was faster, but it also used more CPU (and was generally wasteful).

I discovered this because I've been researching how to improve REAPER's graphic performance on the iMac 5k in particular, so I started benchmarking. This time around, I figured I should measure how many screen pixels are updated and divide that by how long it takes. Some results, based on my memory (I'm not going to rerun them for this article, laziness).

Initial version (REAPER 5.32 state, using the retina hack described above, public WDL as of today): The one that really jumped out at me was the Retina iMac 5k -- it's a quarter of the speed of the RMBP! WTF. We'll get to that later.

After I realized the hack above was actually doing more work (thank you, Xcode instrumentation), I did some more experiments, avoiding the hack, and found that in the newer SDKs there are kCGImageByteOrderXYZ flags (I don't believe it was in previous SDKs), and found that these alised to KCGBitmapByteOrderXYZ, and that when using kCGBitmapByteOrder32Host with the pixel format for CGImageCreate()/etc, it would speed things up. With retina hack removed: With retina hack removed and byte order set to host: The non-retina displays might have changed slightly, but it was insignificant. So, by setting the byte order to native, we get the Retina MBP close to the level of performance of the hack, which isn't great but is serviceable, and at least the CPU use is decreased. This also has the benefit (drawback?) of making the byte-order of pixels the same on macOS/Intel and win32, which will take some more attention (and a lot of testing).

From profiling and looking at the code, this blit performance could easily be improved by Apple -- the inner loop where most time is being spent does a lot more than it needs to. Come on Apple, make us happy. Details offered on request.

Of course, this really doesn't do anything for the iMac 5k -- 200MPix/sec is *TERRIBLE*. The full screen is 15 megapixels, so at most that gets you around 13fps, and that's at 100% CPU use. After some more profiling, I found that the function chewing the most CPU ended in "64". Then it hit me -- was this display running in 16 bits per channel? A quick google search later, it was clear: the Retina iMacs have 10-bit displays, and you can run them in 10 bits per channel, which means 64 bits per pixel. macOS is converting all of our pixels to 64 bits per pixel (I should also mention that it seems to be doing a very slow job of it). Luckily, changing the color profile (in system preferences, displays) to "Generic RGB" or similar disables this, and it gets the ~800MPix/sec level of performance similar to the RMBP, which is at least tolerable.

Sorry for the long wordy mess above, I'm posting it here so that google finds it and anybody looking into why their software is slow on macOS 10.11 or 10.12 on retina imacs have some explanation.

Also please please please Apple optimize CGContextDrawImage()! I'm drawing an image with no alpha channel and no interpolation and no blend mode and the inner loop is checking each pixel to see if the alpha is 255? I mean wtf. You can do better. Hell, you've done way better. All that "new" Retina code needs optimizing!

Update a few hours later:
Fixing various issues with the updated byte-ordering, CoreText produces quite different output for CGBitmapContexts created with different byte orderings:


Hmph! Not sure which one is "correct" there... hmm... If you use kCGImageAlphaPremultipliedFirst for the CGBitmapContext rather than kCGImageAlphaNoneFirst, then it looks closer to the original, maybe. ?

Also other caveat: NSBitmapImageRep can't seem to deal with the ARGB format either, so if you use that you need to manually bswap the pixels...

Update (2019): SolvedWorked around most of this issue by using Metal, read here.

4 Comments

I've been investigating, once again, the performance of drawing code-rendered RGBA bitmaps to NSViews in OSX. I found that on my Retina Macbook Pro (when the application was not in low-resolution legacy mode), calling CGContextSetInterpolationQuality with kCGInterpolationNone would cause CGContextDrawImage() to be more than twice as fast (with less filtering of the image, which was a fair tradeoff and often desired).

The above performance gain aside, I am still not satisfied with the bitmap drawing performance on recent OSX versions, which has led me to benchmark SWELL's blitting code. My test uses the LICE test application, with a screen full of lines, an opaque NSView, and 720x500 resolution.

OSX 10.6 vs 10.8 on a C2D iMac

My (C2D 2.93GHz) iMac running 10.6 easily runs the benchmark at close to 60 FPS, using about 45% of one core, with the BitBlt() call typically taking 1ms for each frame.

Here is a profile -- note that CGContextDrawImage() accounts for a modest 3.9% of the total CPU use:


It might be possible to reduce the work required by changing our bitmap representation from ABGR to RGBA (avoiding sseCGSConvertXXXX8888TransposeMask and performing a memcpy() instead), but in my opinion 1ms for a good sized blit (and less than 4% of total CPU time for this demo) is totally acceptable.

I then rebooted the C2D iMac into OSX 10.8 (Mountain Lion) for a similar test.

Running the same benchmark on the same hardware in Mountain Lion, we see that each call to BitBlt() takes over 6ms, the application struggles to exceed 57 FPS, and the CPU usage is much higher, at about 73% of a core.

Here is the time sampling of the CGContextDrawImage() -- in this case it accounts for 36% of the total CPU use!


Looking at the difference between these functions, it is obvious where most of the additional processing takes place -- within img_colormatch_read and CGColorTransformConvertData, where it apparently applies color matching transformations.

I'm happy that Apple cares about color matching, but to force it on (without allowing developers control over it) is wasteful. I'd much rather have the ability transform the colors before rendering, and be able to quickly blit to screen, than to have to have every single pixel pushed to the screen color transformed. There may be some magical way to pass the right colorspace value to CGCreateImage() to bypass this, but I have not found it yet (and I have spent a great deal of time looking, and trying things like querying the monitor's colorspace).

That's what OpenGL is for!
But wait, you say -- the preferred way to quickly draw to screen is OpenGL.

Updating a complex project to use OpenGL would be a lot of work, but for this test project I did implement a very naive OpenGL blit, which enabled an OpenGL context for the view and created a texture for drawing each frame, more or less like:

    glDisable(GL_TEXTURE_2D);
    glEnable(GL_TEXTURE_RECTANGLE_EXT);

    GLuint texid=0;
    glGenTextures(1, &texid);
    glBindTexture(GL_TEXTURE_RECTANGLE_EXT, texid);
    glPixelStorei(GL_UNPACK_ROW_LENGTH, sw);
    glTexParameteri(GL_TEXTURE_RECTANGLE_EXT, GL_TEXTURE_MIN_FILTER,  GL_LINEAR);
    glTexImage2D(GL_TEXTURE_RECTANGLE_EXT,0,GL_RGBA8,w,h,0,GL_BGRA,GL_UNSIGNED_INT_8_8_8_8, p);

    glBegin(GL_QUADS);

    glTexCoord2f(0.0f, 0.0f);
    glVertex2f(-1,1);
    glTexCoord2f(0.0f, h);
    glVertex2f(-1,-1);
    glTexCoord2f(w,h);
    glVertex2f(1,-1);
    glTexCoord2f(w, 0.0f);
    glVertex2f(1,1);

    glEnd();

    glDeleteTextures(1,&texid);
    glFlush();
This resulted in better performance on OSX 10.8, each BitBlt() taking about 3ms, framerate increasing to 58, and the CPU use going down to about 50% of a core. It's an improvement over CoreGraphics, but still not as fast as CoreGraphics on 10.6.

The memory use when using OpenGL blitting increased by about 10MB, which may not sound like much, but if you are drawing to many views, the RAM use would potentially increase with each view.

I also tested the OpenGL implementation on 10.6, but it was significantly slower than CoreGraphics: 3ms per frame, nearly 60 FPS but CPU use was 60% of a core, so if you do ever implement OpenGL blitting, you will probably want to disable it for 10.6 and earlier.

Core 2 Duo?! That's ancient, get a new computer!
After testing on the C2D, I moved back to my modern quad-core i7 Retina Macbook Pro running 10.9 (Mavericks) and did some similar tests.

  • Normal: 12-14ms per frame, 46 FPS, 70% of a core CPU use
  • Normal, in "Low Resolution" mode: 6-7ms per frame, 58FPS, 60% of a core CPU use
  • Normal, without the kCGInterpolationNone: 29ms per frame, 29 FPS, 70% of a core CPU use
  • Normal, in "Low Resolution" mode, without kCGInterpolationNone: same as with kCGInterpolationNone.
  • GL: 1-2ms per frame, 57 FPS, 37% of a core CPU
  • GL, in "Low Resolution" mode: 1-2ms per frame, 57 FPS, 40% of a core CPU
Interestingly, "Low Resolution" mode is faster in all modes except for GL, where apparently it is slower (I'm guessing because the hardware accelerates the GL scaling, whereas "Low Resolution" mode puts it through a software-scaler at the end.

Let's see where the time is spent in the "Normal, Low Resolution" mode:

This looks very similar to the 10.8, non-retina rendering, though some function names have changed. There is the familiar img_colormatch_read/CGColorTransformConvertData call which is eating a good chunk of CPU. The ripc_RenderImage/ripd_Mark/argb32_image stack is similar to 10.8, and reasonable in CPU cycles consumed.

Looking at the Low Resolution mode, it really does behave similar to that of 10.8 (though it's depressing to see that it still takes as long to run on an i7 as 10.8 did on a C2D, hmm). Let's look at the full-resolution Retina mode:

img_colormatch_read is present once again, but what's new is that ripc_RenderImage/ripd_Mark/argb32_image have a new implementation, calling argb32_image_mark_RGB24 -- and argb32_image_mark_RGB24 is a beast! It uses more CPU than just about anything else. What is going on there?

Conclusions
If you ever feel as if modern OSX versions have gotten slower when it comes to updating the screen, you would be right. The basic method of drawing ixels rendered in a platform-independent fashion to screen has gotten significantly slower since Snow Leopard, most likely in the name of color-accuracy. In my opinion this is an oversight on Apple's part, and they should extend the CoreGraphics APIs to allow manual application of color correction.

Additionally, I'm suspicious that something odd is going on within the function argb32_image_mark_RGB24, which appears to only be used on Retina displays, and that the performance of that function should be evaluated. Improving the efficiency of that function would have a positive impact on the performance of many third party applications (including REAPER).

If anybody has an interest in duplicating these results or doing further testing, I have pushed the updates to the LICE test application to our WDL git repository (see WDL/lice/test/).

Update: July 3, 2014
After some more work, I've managed to get the CPU use down to a respectable level in non-Retina mode (10.8 on the iMac, 10.9/Low Resolution on the Retina MBP), by using the system monitor's colorspace:

    CMProfileRef systemMonitorProfile = NULL;
    CMError getProfileErr = CMGetSystemProfile(&systemMonitorProfile);
    if(noErr == getProfileErr)
    {
      cs = CGColorSpaceCreateWithPlatformColorSpace(systemMonitorProfile);
      CMCloseProfile(systemMonitorProfile);
    }
Using this colorspace with CGContextCreateImage prevents CGContextDrawImage from calling img_colormatch_read/CGColorTransformConvertData/etc. On the C2D 10.8, it gets it down to 1-2ms per frame, which is reasonable.

However, this mode is appears to be slower on the Retina MBP in high resolution mode, as it calls argb32_image_mark_RGB32 instead of argb32_image_mark_RGB24 (presumably operating on my buffer directly rather than the intermediate colorspace-converted buffer), which is even slower.

Update: July 3, 2014, later
OK, if you provide a bitmap that is twice the size of the drawing rect, you can avoid argb32_image_mark_RGBXX, and get the Retina display to update in about 5-7ms, which is a good improvement (but by no means impressive, given how powerful this machine is). I made a very simple software scaler (that turns each pixel into 4), and it uses very little CPU. So this is acceptable as a workaround (though Apple should really optimize their implementation). We're at least around 6ms, which is way better than 12-14ms (or 29ms which is where we were last week!), but there's no reason this can't be faster. Update (2017): the mentioned method was only "faster" because it triggered multiprocessing, see this new post for more information.

As a nice side effect, I'm adding SWELL_IsRetinaDC(), so we can start making some things Retina aware -- JSFX GUIs would be a good place to start...

5 Comments


Hooray! Thank you to my fine coworkers/coconspirators: Schwa, Christophe, Ollie, White Tie, Geoff, and all of the lovely people who helped test and gave valuable feedback. Schwa and White Tie did a fantastic job on the new web site, too, I must say. <3

Soon after posting the new website (thank you, git, for making that easy), and after grabbing a celebratory coffee, I noticed the web site was a bit slow. The CPU use was low (an Amazon EC2 instance), but on looking at the bandwidth graph I saw it was pushing a ridiculous amount of traffic (presumably saturating the link, even). After mirroring the downloads to a CDN (CloudFront), all was well. Thank you, Amazon. I'm very impressed with AWS/EC2/etc. It's good stuff.

8 Comments

Fun with denormals
July 14, 2011
Something that often comes up when writing audio applications are things called "denormals". These are floating point numbers that are very small, so small in fact that for some reason CPU designers think they are quite rare (OK so they are), so the circuitry that processes them is very slow, when compared to that of a "normal" sized number.

If you read that (awful) explanation, and didn't understand it or care, I apologize, you can stop reading now because it will only get more boring and less intelligible.

We have made some common code to filter out these numbers, as many others no doubt have done:
// boring code omitted, see WDL/denormal.h for the full code
// WDL_DENORMAL_DOUBLE_HW() is a macro which safely gets you the high 32 bits of a double as an unsigned int.

static double WDL_DENORMAL_INLINE denormal_filter_double(double a)
{
  return (WDL_DENORMAL_DOUBLE_HW(&a)&0x7ff00000) ? a : 0.0;
}
The above code pretty simply looks at the exponent field of the double, and if it is nonzero, returns the double, otherwise it returns 0.

Recently it came to our attention that we actually needed to filter out larger numbers as well (when sending those numbers through a process such as a FFT, they would end up as denormals). If we pick a number around 10^-16 (not being picky about the exact cutoff), which has an exponent of 0x3C9, we can choose to filter when the expoenent field is under that value:
static double WDL_DENORMAL_INLINE denormal_filter_double_aggressive(double a)
{
  return ((WDL_DENORMAL_DOUBLE_HW(&a))&0x7ff00000) >= 0x3c900000 ? a : 0.0;
}
That was pretty much free (ok slightly larger code, I suppose). One nice thing that became apparent was that we could filter NaN and infinity values this way as well (exponent == 0x7FF), with only the cost of a single integer addition:
static double WDL_DENORMAL_INLINE denormal_filter_double_aggressive(double a)
{
  return ((WDL_DENORMAL_DOUBLE_HW(&a)+0x100000)&0x7ff00000) >= 0x3cA00000 ? a : 0.0;
}
Note that the exponent is increased by 1, so that 0x7FF becomes 0, and we adjusted the cutoff constant for the change.

An extra thought: if you need to pick the cutoff number more precisely, you could change the mask to 0x7fffffff and the cutoff (0x3cA00000) to include some of the fraction digits...

Additional reading: IEEE_754-1985.

Oh and happy bastille day!

3 Comments
Today I found this out
January 10, 2010
(after spending much of the day banging my head against the wall)
#include <stdio.h>
struct test1 {  double b; };
struct test2 { int a; test1 b; };
test2 foo;
int main() 
{
  printf("%d\n",(int)&foo.b.b - (int)&foo);
  return 0;
}
What does this print? On Windows, it prints 8. On OS X (or linux), it prints 4. Which means, if you access foo.b.b a lot, it will be slow. UGH. I guess that's why there's -malign-double for gcc. Now if I can just figure out how to enable that for Xcode...



Recordings:

freeform jam with brennewtnoj

7 Comments

It does seem to relate to one of my processes (ninjamsrv), but still, shouldn't that not be able to bring the system to its knees?

Debian, linux 2.6.18 on a 2.4ghz P4.. bleh. The remote power switching box gets lots of use (every few weeks) as a result.

---

Message from syslogd@test at Fri Feb 20 06:28:04 2009 ... test kernel: Oops: 0000 [#1]

Message from syslogd@test at Fri Feb 20 06:28:04 2009 ... test kernel: SMP

Message from syslogd@test at Fri Feb 20 06:28:04 2009 ... test kernel: CPU: 0

Message from syslogd@test at Fri Feb 20 06:28:04 2009 ... test kernel: EIP is at do_page_fault+0xa0/0x481

Message from syslogd@test at Fri Feb 20 06:28:04 2009 ... test kernel: eax: f695c030 ebx: 6b67cadd ecx: 0000007b edx: 00000000

Message from syslogd@test at Fri Feb 20 06:28:04 2009 ... test kernel: esi: a714c051 edi: 48f20084 ebp: 48f20000 esp: f695c00c

Message from syslogd@test at Fri Feb 20 06:28:04 2009 ... test kernel: =======================

Message from syslogd@test at Fri Feb 20 06:28:04 2009 ... test kernel: Oops: 0000 [#2]

Message from syslogd@test at Fri Feb 20 06:28:04 2009 ... test kernel: SMP

Message from syslogd@test at Fri Feb 20 06:28:04 2009 ... test kernel: CPU: 0

Message from syslogd@test at Fri Feb 20 06:28:04 2009 ... test kernel: EIP is at show_trace_log_lvl+0x3e/0x6a

Message from syslogd@test at Fri Feb 20 06:28:04 2009 ... test kernel: eax: 00001ffd ebx: 0000007b ecx: 00000046 edx: 00000000

Message from syslogd@test at Fri Feb 20 06:28:04 2009 ... test kernel: esi: c0291c24 edi: 00000000 ebp: c0291d8d esp: f695bf24

Message from syslogd@test at Fri Feb 20 06:28:04 2009 ... test kernel: ds: 007b es: 007b ss: 0068

Message from syslogd@test at Fri Feb 20 06:28:04 2009 ... test kernel: Process ninjamsrv (pid: 2490, ti=f695a000 task=dfa82000 task.ti=f695a000)

Message from syslogd@test at Fri Feb 20 06:28:04 2009 ... test kernel: Stack: f695c06f 00000018 00000000 c0291d8d c0103c21 c0291d8d c0291c52 c0291d8d

Message from syslogd@test at Fri Feb 20 06:28:04 2009 ... test kernel: f695bfd8 f695c00c 00000002 00010206 f695bfd8 f695c00c c0103d51 c0291d8d

Message from syslogd@test at Fri Feb 20 06:28:04 2009 ... test kernel: c0291d81 00000001 00000068 c0115344 00000000 f695bfd8 00000206 c0103f44

Message from syslogd@test at Fri Feb 20 06:28:04 2009 ... test kernel: Call Trace:

Message from syslogd@test at Fri Feb 20 06:28:04 2009 ... test kernel: Code: 1c 56 68 5d 1e 29 c0 e8 9a 9d 01 00 89 f2 b8 df bc 29 c0 e8 06 25 03 00 58 5a 83 c3 04 39 fb 76 2a 8d 87 fd 1f 00 00 39 c3 73 20 <8b> 33 89 f0 e8 cc 83 02 00 85 c0 74 e2 eb c7 55 89 cb 68 24 1c

Message from syslogd@test at Fri Feb 20 06:28:04 2009 ... test kernel: EIP: [] show_trace_log_lvl+0x3e/0x6a SS:ESP 0068:f695bf24

Message from syslogd@test at Fri Feb 20 06:28:04 2009 ... test kernel: ds: 007b es: 007b ss: 0068

Message from syslogd@test at Fri Feb 20 06:28:04 2009 ... test kernel: Process ninjamsrv (pid: 2490, ti=f695a000 task=dfa82000 task.ti=f695a000)

Message from syslogd@test at Fri Feb 20 06:28:04 2009 ... test kernel: Stack: 00000000 f695c030 e671cf24 31635f75 6b67cadd a714c051 c01152a4 48f20000

Message from syslogd@test at Fri Feb 20 06:28:04 2009 ... test kernel: c01037f9 6b67cadd 0000007b 00000000 a714c051 48f20084 48f20000 f695c088

Message from syslogd@test at Fri Feb 20 06:28:04 2009 ... test kernel: 2098007b dcb8007b ffffffff c0115344 00000060 00010206 00000000 f695c088

Message from syslogd@test at Fri Feb 20 06:28:04 2009 ... test kernel: Call Trace:

4 Comments

to my love, OS X.
February 14, 2008
JUST KIDDING. Quite the opposite in fact. It's been really exhausting porting stuff to OS X. Here are some reasons why:

1) Poor (and often hard to find) documentation-- Yes, some of the newer APIs are decently documented, but dig in and try to use ATSU to render text, and it's a world of pain. Looking through header files that all seem to assume you know what to do. This is tolerable, though, with enough digging you can find what you want.

2) The AudioUnits SDK-- the API for AudioUnits is defined in a header, but not documented. So to use AU, you'd have to either just use the SDK (with EXTENSIVE amounts of code), or reverse engineer it to figure out what calls you need to do to control the plug-ins yourself. Someone obviously spent a lot of time defining an extensible plug-in API, why the fuck don't they document it?! I mean, really, just a "first, call this, then, call that, then, when you're ready to process X, do Y." If this info is somewhere, someone please let me know... (see the next point)

3) The previous two points might to be related to the fact that Apple seems to assume that as a Mac developer, I've been developing for macs continuously since 1984, and have religiously read the developer mailing list since whenever it was created. Apple: for the love of god find some way of getting those mailing list posts linked to/from the relevant documentation pages.

4) There are WAY too many ways to accomplish similar things. The classic example which I bitch about a lot is text rendering--last I checked, there is CoreText (apparently awesome, but 10.5 only), CoreGraphics text functions (seem nice, but lots of limitations including non-functioning text measuring), HITheme rendering (which is nice but doesnt give you much for font style selection), AppKit NSString/NSAttributedString drawing (great, but slow), ATSUI (seems to be the best all around but takes a bit to get to the point where you get what's going on). I understand that there are historical reasons for these APIs, but again, this can be fixed with proper documentation (perhaps a page describing all of the APIs and their benefits and drawbacks).

5) Addition of new APIs in new OS versions. I know Apple wants to sell new OS versions, but from a developers standpoint, it's really difficult to properly support multiple versions of OS X. I'd like to use new OS features if available, but fall back to old versions if not. If there's a clean way to do this, I'd love to hear about it -- on Windows we usually just load the appropriate DLLs if available..

6) Performance on OS X for basic graphics drawing seems terrible. Perhaps if you take advantage of the highly OS X specific stuff, you can get around some of this, but as an example I made two native projects, one for OS X and one for Win32, that create a 640x480 window and try to draw at about 30fps. They fill the background black and draw white lines. On Windows basic double buffering is used, on OS X the system buffers the drawing. The OS X version uses Cocoa and CoreGraphics to draw, and the view is opaque.

The source code which you can build is here (VC6/win and Xcode2.4+ for OS X required).

Results: on the same Core2 hardware: OS X: 11% CPU use. WinXP: 1% CPU use. In fairness to OS X, it was drawing pretty antialiased lines, however when I disabled AA on the OS X build, the CPU use went _UP_ to 20%. Go figure. It's not really the line drawing, either--make it draw just one line the numbers dont change much...

and Windows...
February 14, 2008
Well I don't have anything to really bitch about Windows right now, but I'm really disappointed with all of the VC++ builds after VC6. The issues are a plenty, but it makes it hard to upgrade. All dynamic linking uses msvcrt71.dll etc which even MS doesnt distribute anymore, so you end up having to static link. Bleh.

I guess most people don't care about the size of their software, but for us keeping the program size down is also part of keeping the development process fast and efficient. If I have to upload a 30mb installer and everybody has to download one to test...

14 Comments

hoofin..
September 22, 2007
Listening to an mp3 album and reading Deerhoof's diary (both here).. awesome. Those guys rule.

wow i just noticed...
September 22, 2007
Radiohead (my favorite band) is now selling all of their catalogue online in DRM-free MP3. WOW. This is awesome.

I might have to buy some of the B sides I don't have...

four paths to mediocrity
September 22, 2007
I've just started using my fourth mobile phone this year. They all are lacking, in mostly different ways...

The Motorola MPx 220

This phone I've had the longest. It's a great flip phone factor (which makes it very satisfying to hang up), and while it's technically a smart phone (running Windows CE of all OSes), it has good phone features (the dialing feels nice, and it's easy to look up names in the address book using the number pad). It was a pain to sync with the computer without having to use some shareware backup software, though. There was no pointing device, the data connection was a bit slow, the browser basic, but even though the screen was pretty small, you could play videos using TCPMP.. Also I like how easy it is to charge via the USB cable too.

The Palm Treo 650

I think I had the biggest complaints with the Treo (in fact, I think it's the one I'd be the least likely to go back to). From crashing silently and not answering calls, to the touch screen annoying my face, to it being the largest, the list can go on. The upsides are the huge software library (mmm the VNC client was very usable), and incredibly good battery life.

The Nokia E70

This phone (which isn't really offered by any carriers in the US) was great, and I think I may still use it. It's bluetooth support is by far the best (I used BT dial up networking + GPRS on a 3 hour train ride last month and it was outstanding), the phone has great features (WiFi, fast EDGE data, great keyboard that flips out, a good SSH client that you can leave running and come back to, even after making/receving calls or running other apps, etc). The screen is really high resolution, too, but it's a bit small (so you really have to get very close to read it). I really loved my old Nokia TDMA phones, but I think Nokia needs to reevaluate some decisions they made on the E70's firmware. For example, there's a button on the left side of the phone that opens the voice recorder and starts it recording. It's very easy to hit, and there's no way to override this function! Ack. Also there's no auto-lock after a certain amount of time, so you have to be sure to manually lock the keypad. Oh, and little things like looking up names in the contact list, with the keyboard closed, is much harder than it needs to be. The MPX220's method was far superior.

The Apple iPhone

I just got this yesterday (thanks DB), and there's definitely things to like. I dig the screen. It's about the same resolution of the E70, but oh it's so much bigger and easier to read. The web browser is fantastic. The phone is decent. The homebrew software is getting there (VNC client looked nice, but no keyboard yet? ack). Mmm, real headphone jack, too. Why couldn't they do that on the E70? And it autolocks too. Hello, Nokia? But alas, no BT DUN (sad). The touchscreen keyboard, while as good as I've seen for the type, makes me wish I had the E70's again... Oh well...

So maybe I'll start carrying the iPhone and take the E70 with me on trips (for when I need BT DUN)...

anyway

3 Comments

porting frustration
December 15, 2006
(9/10 on the boring scale)

So I just spent the last day or two (I honestly can't remember) getting Jesusonic to run on OS X/x86. I already had it running on OS X/PPC, so I figured it would be easy. Wrooong...

Jesusonic works by compiling code on the fly into native assembly. It actually does this by glueing together stubs of functions (the most critical which are written in assembly for each platform) into code that it can execute. Overall the code it generates is not terribly optimized, but it's definitely not slow either.

So this is what I found in porting JS.

1) the compiler bitched about my assembly code trashing EBX. Turns out, EBX is used for position independent code (PIC) addressing. So I figured I should probably not be messing with EBX, and that the OS needed it. With a bit of additional work on the compile side I got it to not use EBX, but in the end it turned out that EBX only really needs to be preserved within functions that use it, and for my uses I really could use it. Oh well. Time wasted, but hey kinda useful. Stuff still wasnt working right.

2) Many of the assembly stubs for particular functions needed to call C code, whether it be a C library function like pow(), or some of our own code (FFTs, file reading, accessing memory, etc). Since GCC was generating my functions as PIC, the extended assembly syntax failed to assemble (ending up with assembly like "movl ((symbolname-$LABELBLAHBLAH(%ebx))), %edi", etc. So turns out MY compile step needs to actually go generate absolute addresses at runtime, instead of at compile time. Fair enough, that took an hour or so, and a bunch of testing/fixing to make sure that nothing broke.

3) And this was the big bitch, and it took me a long time to figure out. Turns out, and this is well documented, you have to keep the stack aligned to 16 bytes. I would call pow(), and it would end up trying to do an sse load/store at an unaligned address, and things would proceed to blow up. So I had to go update all of my stubs and functions to keep the stack nicely aligned, which is probably not a bad idea anyway. Once I finally got them all correct, I tried it out, and... IT still didn't work. So I ended up spending a lot of time with GDB (Xcode's debugger won't let me see registers, argh), and figured out that, indeed, the stack was aligned when I called my generated, code, but no, the stack wasn't aligned when it got to pow().

After changing some build settings, I found that with -O0, it did in fact work. So then I did some gcc -S -O0 file.c and gcc -S -O2 file.c and compared the generated code for the assembly stubs, and it seems that with -O2, gcc itself would let the stack get unaligned , as long as my stub wasn't obviously calling another function.

I looked for a long time to see if I could disable this in gcc, and I gave up, so on OS X/x86 Jesusonic will have this code for each stub:

pushl %ebp
movl %esp, %ebp
andl $-16, %esp

(run code)

leave

that way, whenever I call out, I can ensure that the stack is aligned, no matter what kind of crap GCC is generating for the function.

The better way of dealing with this would probably be to write these functions in assembly directly, or improve the code that cuts up the stubs to have it filter out the stack frame setup that GCC produces anyway, but hell I'm too lazy and this works and it's reasonably fast enough as it is. And most importantly, I get to get back to the fast, satisfying building of UI and porting of easy things.



4 Comments

and here we are.
April 12, 2006
...where to begin. Tom got married last weekend. Was one hell of a fun weekend. I was a bit worried it would be too uncomfortable, out among the orange trees, but the weather turned out great, Brennan and Isabelle brought an RV, and we all had a good time (and punched oranges).

Got to fly the little RC heli out there too, which was fun.. so when I got back I finally installed the upgraded symmetrical rotors, new motor and heatsinks, and LiPO battery and voltage alarm. Flew it inside today, though didnt do anything fancy since I'm not that good yet.

REAPER news
April 12, 2006
I heard back from Mackie, with a response that stupified and angered me. They only license documentation for the MCU under NDA to select companies (as in, not us). They don't seem to care about their customers, just their own (less than obvious to me) interests. I can only hope I don't get the same nonsense from Propellerheads, because I think people using REAPER will want Rewire more than full MCU support anyway (REAPER already supports the MCU somewhat, but without doing a lot more research it can't do all the fancy things that the MCU is capable of). Sigh. The pain of trying to enter a market that is very established and saturated. But I'm only mildly deterred. If Propellerheads *does* dick me around, I will do one of two things-- either go to Stockholm and try to convince them with beer (though don't get any ideas, guys), or just do what I've been meaning to for a while, and port JACK to Windows.

MacBook Poo
April 12, 2006
Speaking of Windows, after BootCamp was released, Steve convinced me to get (at a discount at least) a MacBook Pro. So I did. For running OS X it's pretty good, but as a Windows XP laptop it really kinda sucks. There's a few driver issues that need fixing (no backlit keyboard in windows, sound always goes out speakers, using the built in camera causes BSoD, etc), but the big hardware issue is that the Mac keyboard is lacking keys. Specifically: insert and backspace(delete). I did find some good 3rd party utility that does help a lot allows all sorts of remapping, and enabled the mac's FN key. And finally, the thing gets HOT. I guess that's to be expected with 2x 2ghz cores, but still. I got spoiled by my little Sony's 1ghz P-M that runs nice and cool (and granted, slow).

Oh yes, and southpark this week was SO awesome.

Recordings:

notthefirsttimealone

9 Comments

and 0.44
January 10, 2006
(of REAPER)... is up.

The biggest single thing in it is that I made a nice system that buffers all source material in another thread, so the audio thread doesnt have to wait around for a slow network device or disk, etc (well, it still might have to, but it's a lot less likely). All of the effects and mixing of tracks still run in the audio thread, though that may change eventually, but for now it makes sense (since you may be monitoring an input on those channels, and would want to have the effects applied on there with as little latency as possible). At any rate, playback is now a LOT more reliable, with less little dropouts. It took me a few days of thinking to come up with this compromise, and a few hours to code it, but I'm pretty happy about it, thus far.

The other thing of interest is I made a separate position and playback cursor, so you can see where actions such as splitting will take place (or where you will start playback if you hit play and you were stopped). Makes a lot more sense now.

And there's a bunch of other small things (MIDI peaks now show the approximate notes/durations/etc), VST latency compensation, bla bla bla.

I picked up a $200 Behringer midi/usb control surface, going to add support for it tonight I think. Initially the faders will just map to volume sliders (and be controlled by them), but eventually there will be automation modes, I'm planning (automation will be supported with or without a real surface, that is, mmmm).

So much to do + no deadline = happy justin.

Comment...

Friday should be the day
December 21, 2005
El nuevo screenshotto (I know, I know, bad):

Won't be long now. The general todo list is still quite large, but the number of items that need to be done before releasing is quite small (really just a few).

It's so satisfying working in an environment where doing new things is easier than I expect; it makes me happy to get things done quickly, and happier that I created the workable environment to begin with.

I'm hoping I can keep REAPER feeling as sparse as I feel it is now. I might be biased, but it feels less cluttered than a lot of other audio software, and I think that's a really good thing.

Steve kept comparing shit to garageband and telling me how I could probably take ideas from it. I'd used it some before, but today I fired it up (well, spun it up is more like it, since I have an ancient 667mhz mac), and was thoroughly disgusted. I honestly don't know where to begin. Basically, everything UI about it sucks. The audio end of it seems decent, though it's painfully slow on my G4. Anyway. Back to bitching about the UI-- It has a ton of buttons that have tiny little icons that it isn't really obvious what they do, and there are no tooltips or anything to help you learn, you really have to click, figure out what happens, etc. So lame. And when I drag a nice loop into the project, I can't just resize it larger and have it loop, it makes me go drag a second copy of it. lame x10. anyway.



Recordings:

satanslovesong

6 Comments

planes and pianos
September 22, 2005
Been flying my F-27 more, so much fun. Tried increasing the throw even more 
today, but it was too much, and the servo/radio combination didnt have enough
precision for me to fly as smooth as I'd like-- which is a shame, cause the
snap rolls were fun.

When Francis was in town we got a Slo-V (made by Parkzone as well), too. We
didn't get to fly it while Francis was here, but I managed to yesterday, and
while it is in fact very slow, it's also very underpowered. At 65F 80' above
sea level, it doesn't seem to have enough power to get out of ground effect!
I don't think I ever got it over 15' up (ground effect is usually 2x the 
wingspan, right?). It does travel nice and slow, though. Anyway...
I picked up a free piano off of craigslist last night (thanks for the help, 
everybody).. Going to start fixing it up a little. And then once it's
somewhere that is more temperate I will have it tuned/etc. Starting to make new
tops for the sharp (black) keys, using some spare basswood lying around. 
If anybody has any good resources for piano parts, let me know. 

A terrible little jam I made with the new piano is here.


3 Comments

As a followup to the last post, here is a picture of the new Jesusonic model in development. 
It's considerably smaller than the CrusFX 1000, and has plenty of room to shrink even more 
for the production model. This is the secular alternative to the CrusFX, and has other 
advantages. This one measures about 21" wide, 9" deep, and 3.25" high. I am pretty confident 
that will be able to be more like 8" deep and 2.5" high soon.

On a personal note, it's very satisfying going through an interative design process, learning as
I go, and realizing "hey, the next model I can make it even smaller." 

I think this model is going to get stained, as well. yum. The screen and keyboard will be recessed
into the top of it, the knobs will be along the front of the top (and they will be protected if
I get my way), the CF slot will be on the right side, all the power and audio i/o will be on 
the back, and the right side will have the footboard and expression pedal ports.

Soon I will be building the custom boards for this model. I set up a new little desk just for 
soldering and testing. fun.
 
I still need to find a nice compact backlit keyboard. help.


4 Comments
i really like guitars
January 16, 2005
It's really amazing how good a decent acoustic guitar can sound.. 
Made this little ditty yesterday, diggin it. Also trying out various
free VSTi synthesizers, some are pretty rad.

Also, I just found SongFight for the first time, so hot (I know, it's been 
around for a while, and I'm slow to find it.. For the record, I'm not posting 
links to things here saying "look at me I found it first" it's more that I'm
just logging what I've been looking at, etc. This is mostly in response to a 
comment in my last entry)


Recordings:

freeform jam with brenpeter
synthditty

5 Comments
Reward:
  To the first person who can get me a source to buy some AMD Geode NX 
DB1500 development boards, for a reasonable price (i.e. less than $700), 
within the next month or so, I will send via paypal or whatever, $200. 

The VIA EPIA boards are just too slow for what I need, and the pentium
M boards are hot/expensive/etc.

Thank you. <3


Comment...
go west
September 15, 2003
Just got home, long flight.  Riding in the cab away from NYC made me
feel very lonely. Riding away from San Francisco, for example, doesn't.
Maybe it's because SF is home, or maybe it's because SF has a bit more
natural beauty. NYC looked dark and imposing. I looked at it, not 
knowing where my woman was among the hugeness.

On the plane, there was an amazing show on the discovery channel. It
was called 'Living with Tigers', and these guys were talking tiger cubs
who were born in captivity to the wild, and teaching them how to hunt.

Got home a few minutes ago, pushing the door against piles of mail that
had accumulated from the mail slot. Such a lousy feeling, coming home to
an empty house after a long trip, alone.

Ah but cheer up lad, there's work to do, and people to go see tomorrow.
I should go to sleep, so I can get up in the morning and wash some clothes.
G'night all.


Comment...


Search comments - Ignore HTML tags
search : rss : recent comments : Copyright © 2026 Justin Frankel